MongoDB has its own style of storing the documents in the collection and providing the uniqueness to these documents is an important aspect of MongoDB.
So , it is very important that before we go into the higher end of Mongo we should understand what are _id and ObjectId in MongoDB.
In relational databases like Oracle , DB2,MySQL etc whenever we insert any row into the database of our application we never care about providing the value for primary key for that record because it is implicitly provided by our underlying database most likely by using an auto incrementing sequence.
Same logic also works in the case of MongoDB also (not a simple auto increment), whenever we insert a new document into the collection we are not being forced to provide the primary key.In the case of MongoDB it is _id , mongodb will provide it for you and by default type for this primary key is ObjectId. In MongoDB two documents in a collection (logical equivalent of table in RDBMS) can’t have the same _id and even if you try to insert two documents with the same _id , mongodb won’t allow to do that.
Kindly note ObjectId is only the default type , in your document you can use any datatype for this _id key until and unless it is unique it will work. Also there is one more thing to notice , in your relational databases you can have any name for your primary keys but here in mongodb by default primary key is named as _id.
I also tried to find out is there any way to change this key name in the mongodb docs , but I am not able to find it , if anybody knows how to change it , please share it with us.
Now let’s have a look at what’s new in our ObjectId i.e default data type for the “_id” in the document.One thing that we have to keep in mind that MongoDB is designed to provide a horizontal scaling of data (sharded environment) , that means your data is going to reside on multiple machines. Now assume if mongodb is also using traditional auto incrementing value (like RDBMS) then in that case synchronizing primary keys on all the machines will really a time-consuming process and will have a performance impact on your application
To avoid these bottlenecks MongoDB uses ObjectId as a default , which uses 12 bytes of storage.Now this 12 bytes are very interestingly divided into 4 sub parts to ensure that you will always get unique _id value in any case , e.g :
- _id generated on two different machines.
- _id generated on the same machine but by two different processes.
- _id generated on the same machine ,by the same process but in same second.
There will be other cases also where it will generate the unique values.
Now let’s have a look at how these 12 bytes are divided:
Bytes 1 2 3 4 5 6 7 8 9 10 11 12
- 1-4 : First 4 bytes :- First four bytes in the ObjectId stores the information about the time in seconds that has been passed since the epoch.
- 5-7 : Next 3 bytes :– These next three bytes basically stores the machine information where data is residing. It stores the machine’s host name in hash (md5). By having these 3 bytes , it will be ensured that two different machines won’t generate the same ObjectIds.
- 8-9 : Next 2 bytes :- These two bytes stores the information about the process id that is generating the ObjectIds.By having these 2 bytes it will be ensured that 2 different processes on the same machine won’t generate the same ObjectIds.
- 10-12 : Next 3 bytes :- These 3 bytes basically stores the increment counter value.By having these 3 bytes it will be ensured that within the same second, on the same machine , by the same process it will generate the different ObjectIds.
Javadoc for the ObjectId is available here , where you will see that this class also implements Comparable interface that is required as by default _id are indexed.
Below are the other few points that are worth noting about ObjectIds and _id.
- All the documents have _id (primary key) except system collections and capped collections.
- Most of the time it is generated on the client side by the driver (MongoDB design) , to shift as much as load from the server to client.
- ObjectIds can also provide you the details about the timestamp when it is inserted.So, not more need of having created_date or insert_date into your document.
- ObjectIds will also increase as the time passes , so basically if you do a sort on ObjectIds you are sorting it on the basis of creation time.
Disclaimer : All the above used images belong to their respective owners.
Latest posts by Saurabh Jain (see all)
- java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.TaskInputOutputContext, but class was expected - August 8, 2014
- org.datanucleus.store.rdbms.exceptions.MappedDatastoreException: INSERT INTO “TABLE_PARAMS” – Hive with Kite Morphlines - July 17, 2014
- java.io.IOException: can not read class parquet.format.PageHeader: null – Hive - July 12, 2014