A lot of hype... And we still have db level locking. If document level is too difficult, at LEAST do collection level (not that it is too much better, but least it some real improvement).
I was reading somewhere that Mongo can't do document level/record level locking because of mmap'ed files. The whole database is memory mapped. And mmap doesn't understand underlined data structure, it views the whole file as a large single blob.
Ditching mmap will not be that easy, cause most of the speed and simplicity of Mongo comes from using mmap.
mmap does complicate things. A traditional database often works using write ahead logs. The log holds changes made to the database over time - so when you want to write a change to your DB, you put 'I'm changing value x.y to 50' in your WAL. Sometime later the actual data pages holding the modified x data structure can be written out to disk. If you have a crash before the data pages get written out, you can 'replay' your WAL file to redo all the lost changes.
Unfortunately, a central requirement of a write ahead log is that the data pages must not get written before the WAL. If that were to occur, and there were a crash, the system wouldn't know that it had to undo the changes to the data pages, leaving you with corrupt data. mmap generally doesn't provide the ability to pin your dirty pages in memory - they're subject to getting flushed any time the system is under memory pressure - so it makes logging a lot more complicated to get right.
Just to note, the MongoDB journal is basically a write ahead log and has been around since 1.8. It's not simple, you are correct, and involved remapping portions of memory privately, and leads to some inflated numbers on the virtual memory reporting side. There's a great write up here:
Are you implying that a performant Mongo DB must keep all data in RAM? That's not true. We've got almost 1TB in a Mongo DB, and we sure don't have that much RAM.
You do have to be able to keep your indexes in RAM, but that's much less limiting.
Not 1:1, no - thanks for clarifying that. But depending on the content and the indexing, there is a strong correlation between the size of the database and the memory requirements under Mongo DB.
It doesn't use mmap for locking though - and in any case one database is already multiple files (2GB max). There are internal data structures used by MongoDB and the database itself takes care of locking, mmap just gets the data into memory.
Heck I'd even be happy for them to do locking on a per file basis. (For those unfamiliar with mongodb, the actual database storage is broken up into 2GB files.)
You are confusing two things. I am talking about the files making up a database. MongoDB doesn't use one file per database - instead it allocates files up to 2GB in size (the first one starts at 64MB and then they double in size until the 2GB limit). So for example a database that is 200GB will consist of (roughly) 100 2GB files.
What you linked to is a consequence of memory mapping. Mapping a single 2GB file in a 32 bit process will use up virtually all the address space and you couldn't map more than one at a time.
Every Mongo release announcement I hold out hope for an improvement to database-level write lock, and every time I'm disappointing. Considering this is probably the #1 or #2 complaint people have with Mongo, I'm surprised it hasn't been addressed ahead of some of the other features that have made it in to recent releases.
Note also that the lock blocks readers, and that writes are given priority over reads. Consequently even small volumes of writes can have major effects on readers.
In my experience, even 1 producer caused a full database lock for hours on our production server. We have 1 mongo server and an erroneous scheduled task of ours started at about 6am. It's only one process but the task basically re-syncs an entire collection (which was about 80,000 writes).
That single producer caused wide-scale locking/hanging for all readers on the website and I had to manually stop the task during business hours because of that. Oy!
Not for single-server performance. The database level lock severely limits MongoDB's single server performance. Just look up the sysbench benchmark comparing MongoDB with TokuMX (which I work on)
With this release aggregation framework got super powerful. Now it returns a cursor. Now we can get the aggregation results and iterate over them. No more 16mb result limitation as well...
Well, aggregation in MongoDB allows you to write sql like queries, and before 2.6 if aggregated result was > 16mb it was throwing an error, but now it gives you a cursor. Now you can fetch the results and send to another collection. For an example aggregation query in mongo have a look here: http://docs.mongodb.org/manual/tutorial/aggregation-zip-code...
I've been using Elasticsearch as a primary database for my new project, which has basically been a good NoSQL db that happens to have great search. However, peripheral tools(performance testing, hosting) have been a bit rough.
How do the two databases compare now. Is search improving in Mongo or is that something they are not really worrying about at the moment.
2.4 introduced text indexes for full text search as a beta, and 2.6 finishes the job of fully integrating them into the product - they are fully supported with the new release (including in the aggregation framework).
In terms of how they compare, I'm not familiar enough with Elasticsearch to comment, but for basic test searching needs, the implementation in MongoDB is pretty decent. More here:
The price is unbelievably fair. Try deploying mysql in the cloud and while you are at it, you get Redis as well for cache and then you hit the wall you need search in your application, so you deploy elastic search as a service as well. Do the math and compare to ours.
Our pricing starts as low as $15 per month
The cheapest instance of amazon cloudsearch is around $79 monthly, You then have to deploy a transactional DBMS and then most likely S3 for storage as well.
This is good write-up about what's new with actual numbers. http://devops.com/news/mongodb-2-6-significant-release-mongo...
Quote: "MongoDB 2.6 provides more efficient use of network resources; oplog processing is 75% faster; classes of scan, sort, $in and $all performance are significantly improved; and bulk operators for writes improve updates by as much as 5x."
Awesome news. I am excited for the aggregation cursor. As much as I love some of the alternatives that are almost ready I still turn to mongo for a vast majority of my deployments. Hopefully it will keep getting better and pushing others to do the same.
Can someone with more MongoDB experience give me your thoughts on the upgrade difficulty here? Worth doing soon, or waiting for a point release? Does this require a data rebuild/update process (coming from 2.4)?
"the upgrade from MongoDB 2.4 to 2.6 is a binary-compatible drop-in upgrade: shut down the mongod instances and replace them with mongod instances running 2.6."
IMO unless you desperately need one of the new features I would hold off a few weeks. With a release this big I'd expect there will be some bugs and wouldn't be surprised to see 2.6.1 shortly.
-> an ideal use case: capture and store unstructured data, typically tweets. Tweets structure is json based, quite complex with many field and substructure.
It's incredibly to store and manipulate such data with Mongodb without even knowing all the details of the fields!
It's also a good use case beacause mongodb is fairly good at adding data, pretty bad at deleting data.
same with some others document json oriented database (like elastic search), but mongodb is a good compromise in many area, the query language is easy to understand and powerful, the biggest issue being the diffculty to do complex computation and aggreagation: mapreduce help, aggreagation framework helps too, but in this area SQL is generally much faster for instance.
It could have been built with CouchDB however some features such as ad-hoc querying and partial document updates make MongoDB a more compelling choice (albeit prone to some scalability issues until mongodb version 2.8 hopefully lol!).
On the list of reasons to use MongoDB, shouldn't being able to pass over JSON be at the bottom of the list given how trivial it is to pass a mysql row as a json array?
I can't see how that's a defining feature for it, because couchdb and others also do this really really well. Most of the time you don't even need db drivers, because couchdb is just a REST server.
Cursor for aggregate, proper explain for aggregate, index intersection, $redact and other cool operators, Multi* in Geospatial, faster execution and, foundation for document-level locking which should be introduced in MongoDB 2.8. I must say I'm happy with this release.
Playing with Meteor and mongo recently and have found mongo seems a little bit strange from a transitional SQL point of view, like do I need to embed or reference? Can anyone recommend a good book or source?
It means the foundation has been laid. 2.6 included a lot of refactoring and rewriting of some core subsystems, with the apparent goal of eliminating technical debt so they can make more impactful changes in 2.8.
I am a big fan of their focus on manageability for sharded databases. I am less of a fan of their db internals that might require you to use many more shards than a more performant engine. More details at http://smalldatum.blogspot.com
The main reason I use MongoDB on Node is the maturity of the Mongoose ORM - I've used Node-ORM 2, BookshelfJS, and SequelizeJS and none of them felt as mature as Mongoose.
How is it possible to have aggregation cursors without crunching the complete data set in advance (aka map/reduce) and still have consistent and correct results?
The only reasonable assumption is circumstances that do not involve hardware failure, the binary being compiled incorrectly, the source code being modified or replaced by someone downstream, the libraries it is using being corrupt or having been replaced by ABI-incompatible variants... none of these are reasonable circumstances; one would then further assume that the person posting has run into reasonable circumstances where MongoDB often crashes, which is not a stretch given the number of bugs that are filed against it that talk about this kind of issue.
Exactly; well said. It's been about a year but basically I had a replica set where sometimes one replica or the other would segfault and I'd have to manually delete its data files and re-replicate, after which it was fine. Happened about every few months. It was clearly based on application behavior and not an inherent system problem such as those you listed.
For me the really irritating thing is the observation that when I want to enjoy flame wars about Mongo, I seem to see a lot of their 'try TokuMX' everywhere. Last time I checked they are on the 2.2 codebase, unless that recently changed. Mongo 2.4 was a significant upgrade, and assuming I'm correct above, I feel like the Tokutek team disregard that. I get that it's good marketing to suggest an alternative, but going along criticising that which you've built upon doesn't go well with me.
Last time you checked was a while ago, it's 2.4 compatible now (except geo and full-text) and has been since TokuMX 1.3.
We generally don't criticize indiscriminately. MongoDB has a lot of good sides and we embrace and extend those, and where it has faults we try to work around or replace them. Our core strength is fast, reliable, compressed storage and MVCC semantics, so obviously we talk about that a lot, but we also understand and acknowledge that a large amount of TokuMX's success, to the degree it has some, is due to the excellent parts of MongoDB.
As an example, I personally am really excited about what MongoDB has done with aggregations in 2.6 (and what seems to be coming down the pipe soon), and I can't wait to merge it in to TokuMX. We all get stronger together.
They have now. However 2.6 uses different package names, configuration file name, log file etc. It is generally mongod instead of the earlier mongodb (eg before it was /etc/mongodb.conf and is now /etc/mongod.conf).
This means no automatic upgrades to 2.6, and sysadmin action to correct config file name etc.
mongodb is the best database in the whole wide world at the moment. I encourage everyone to jump in mongodb for agile web scale development with full big data capability.
I'm sure that it is a viable choice for some use cases, it's just that I didn't found a use case for it yet.
Being able to choose from PostgreSQL, Redis, Cassandra, heck, even ElasticSearch made me always choose one of those over MongoDB, at least for the problems which I had been trying to solve.
By the way if you are looking for all the above functionality provided by all the DBMS you mentioned in a single DBMS instance, you can check out amisaserver.com. Polygot persistence is just another fad.
If I am not mistaken, this accurately describes the limits of MongoDB in terms of mapping relations. I'm not a Mongo expert because no one could convince me otherwise to date, somebody correct me?
>You are referring to using an index, correct? Because grep is absolutely, madly efficient for a doing a full search.
I'm not sure why you imply that a full search is incompatible with an index.
Perhaps you meant "full scan", that is reading everything while searching, instead of "full search" (searching everything). The first is not a prerequisite for the second.
In any case, grep is a very inefficient way of doing a full search. An index is so much faster it's not even funny.
>The index portion of a file system are called files and directories.
Those are just indexes for the names of the files and folders, and a few other select metadata. Nothing like a full-text search index, or even actual indexes on metadata.
(Some filesystems allow those too, e.g. in BeOS, but nowhere as comprehensive and flexible as using a dedicated tool for this, be it MongoDB or something else).
>Several file names can refer to the same data. Those are called hard links. So with hard links, I can refer to a Foo by their related Bar.
Sounds like a convoluted and inefficient way of building something somewhat like a "document database" with 1/10 the features (if that).
>I'm not a Mongo expert because no one could convince me otherwise to date, somebody correct me?
I'm far from a fan of Mongo, but you seem like you have already made up your mind, and nothing will change it.
Plus, if a filesystem is enough of a document database for you (with no cheating, e.g piling up tons of hacks and add-ons like external full-text scanning tools), then be all means, us one.
It's very close to JSON objects. In fact, it uses JSON/BSON. So it's Hashes of data structures, which MongoDB makes accessible quickly, like a file system must be.
You can also use GridFS to store files in the document database, which actually breaks files into chunks and stores them in collections, also just like a FAT table.
In the case of MongoDB, dump a blob called BSON which itself can be larger than the JSON itself. Paradoxically this is touted as a space efficient binary serialization you then read it back using an index or something.