MongoDB v0.8

We've just released (well, last week...) the first release of MongoDB, an open source, high-performance [and now I'm going off official company script] "queryable persistent cache". Ok, it's a database, but I've discovered that when I introduce it as that, all of the preconceived notions, assumptions, use cases, tools, problem domains, etc that every programmer has after working w/ RDBMSs completely confuses the discussion.

This release is really a baseline release for us at 10gen after we re-focused the company on Jan 1 to the persistence layer of our appserver stack. The appserver will continue as an Apache Licensed project (http://www.babbleapp.org.

This release contains :

  • The MongoDB database (of course)
  • A new, slick, Google V8-based command-line client that lets you interact with the database in JavaScript.
  • Basic tools like import/export, backup/restore.
  • Drivers for Java, Ruby, Python and C++, with PHP probably going to be made available (although not yet releasable) today.
  • A RoR ActiveRecord Connector.
  • An implementation of the ActiveRecord pattern in Ruby (not to be confused with the RoR AR component).

So this think-of-it-not-as-a-database database has some interesting properties. It stores JSON-like documents. I say "JSON-like" because rather than just strings, numbers and booleans, it can store other types like dates, binary data, and distinguish between integer and floating point numbers. It's pretty quick - on my mac laptop, I can do 300k inserts/sec from a Java client (doing them in small blocks of 100 documents per network message), and random reads at about 30k/sec. (Awake readers will note that I'm not transactionally persisting that much data to disk at that rate... disks don't go that fast... a subject for another post). I can do fancy indexing on the "documents" - not just primary, but also index into sub-objects. E.G., f I have a document that in JSON would be structurally represented as :

{
   foo : {
       bar : ....,
       woogie : ....
   },
   x : ....
}

I can create indexes on things like foo.woogie. I can have multiple indexes per collection (think of a collection like a table).

It also has a rich query language that lets you do a lot of the things that you'd expect when coming from a SQL background, and lets you express those queries in a way that is compatible with thinking in the document structure you're working in (in JS notation with the "what I think about in SQL" above it in the comment):

 //  select * from mycollection where foo.bar == 10
  db.mycollection.find({ foo.bar : 10});
  //  select x from mycollection where foo.bar == 10 skip 10 limit 10 order by foo.woogie
  db.mycollection.find({foo.bar : 10}, {x:1}).skip(10).limit(10).order({foo.woogie:1});

Where the first example lets you find all documents in the mycollection collection where the value of bar of the foo element is 10. The second example goes further, skipping the first 10 elements, only returning 10 elements, ordering by the woogie subfield of foo, and limiting the return to partial documents that only contain the x field.

Also, you can do document updates - rather than replacing the whole document if you want to modify it (which is a horror show if you have large documents), you can just update elements of the document in-place :

// update mycollection set total = 10 where id = 12345
db.mycollection.update({id:12345}, {$set:{total:10}});

MongoDB also has some nice replication and semi-HA master pairing features, and sharding is on the way.

What's it good for? Well, as I argue when people give me the chance to speak about it, databases are changing - just look at what is available in the so-called "cloud" arena. It tends not to be a RDBMS if it's scalable. The storage engine under AppEngine, or Amazon's SimpleDB, or any of the Dynamo implementations, etc, all of which change your programming model to one that isn't "tables and joins". Or look at the excellent CouchDB, a JSON store. If the RDBMS isn't being replaced outright (like it has to be in "the cloud"), it can to be augmented with other persistence technologies that are better suited for a portion of the data requirements of a system.

So what's it good for? It works fine as a database, but you can't think relational. If you want to just replace MySQL with something else, but don't want to rethink your data model, MongoDB isn't for you. Because of it's pedigree and initial design requirements, it works very well as an "object" store for dynamic languages. JS objects, Python and Ruby hashes all go in and out very effortlessly :

db.mycollection.save({a:10, b:2});

We've had it supporting news-ish/blog-ish websites in production for a year now, and it does fine there. It does fine as a large object store - think big binary blobs here, like images and videos. We have a POC in progress where we leverage the server-side JS execution feature to provide transaction-like isolation for high-performance shopping-cart/inventory management. (4k a second at last check on a mac desktop). It has some interesting potential as a persistent cache - one where you aren't afraid to restart the cache for fear of the hammering the backend data store will receive.

I think that this DB has a lot of potential, and I look forward to seeing what other kinds of problems it can solve. Download it and try it. We have it available for OS X 32-bit and 64-bit, Linux 32-bit and 64-bit, Windows 32-bit, Solaris 32-bit. Let us know what you think.

http://www.mongodb.org

Categories

  • Food / Wine
  • General Computing
  • Java
  • Misc
  • Travel

Pages

Powered by Movable Type 5.01

About this Entry

This page contains a single entry by Geir published on February 18, 2009 7:41 AM.

Android #3 was the previous entry in this blog.

Intellij IDEA 8.1 and Git! is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.