incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Cohnen <sebastiancoh...@googlemail.com>
Subject Re: Comparison of MongoDB & CouchDB: MongoDB seems better on insert
Date Mon, 20 Dec 2010 23:01:10 GMT

On 20.12.2010, at 23:24, Paul Davis wrote:

> On Mon, Dec 20, 2010 at 5:20 PM, Sebastian Cohnen
> <sebastiancohnen@googlemail.com> wrote:
>> question inside :)
>> 
>> On 20.12.2010, at 23:02, Jan Lehnardt wrote:
>> 
>>> Hi,
>>> 
>>> On 20 Dec 2010, at 22:32, Chenini, Mohamed wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I found this info on the net at http://www.slideshare.net/danglbl/schemaless-databases
>>>> [...]
>>>> Does anyone knows if this was verified?
>>> 
>>> I think the author's comment on slide 35 sums it up pretty nicely:
>>> 
>>> "Of course this is just one (lame) test."
>>> 
>>> Coming up good numbers is hard which means that people with easy ways to make
them come up with bad ones.
>>> 
>>> I've written about the difficulties on benchmarks databases on my blog:
>>> 
>>>  http://jan.prima.de/~jan/plok/archives/175-Benchmarks-You-are-Doing-it-Wrong.html
>>>  http://jan.prima.de/~jan/plok/archives/176-Caveats-of-Evaluating-Databases.html
>>> 
>>> They should give you a few pointers on why this is hard.
>>> 
>>> --
>>> 
>>> To the point: CouchDB generally performs best with concurrent load. In the case
of loading data into CouchDB, bulk requests* will speed up things again. To push CouchDB to
a write limit, you want to use concurrent bulk requests (specific numbers will depend on your
data and hardware).
>> 
>> Does this really speed up things? I've tried this approach (concurrent bulk inserts)
with small/big docs and small/big bulk chunk sizes: the difference was not significant. I
thought this was reasonable, since writes are serialized anyways. The setup was one box generating
documents, creating bulks and keep them in memory and bulk insert batches of complete docs
(incl. simple monotonic increasing ints as doc ids) to another node. delayed commit was off.
>> 
> 
> I think delayed commit would need to be on there otherwise you'll be
> hitting fsync barriers for every bulk docs call which are serialized
> by the updater. Theoretically the speedups would come from letting the
> kernel manage the file buffers and what not.

delayed_commit was off because I needed to test insertion of lots of data (more than what
would fit nicely into memory). I wanted to figure out, if normal bulk vs concurrent bulks
does have an impact on insert performance. the difference was, as I said, not significant
better or worse. btw: I didn't saturated the disks (mid-classed SSDs), since couch was eating
up the CPU (3GHz Core 2 Duo). This was some time ago, maybe this is more disk bound now.

> 
>>> 
>>> * http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API
>>> 
>>> Unfortunately this means that these one-off benchmarks don't show any good numbers
for CouchDB, yet fortunately this shows easily that these one-off benchmarks don't really
reflect common real-world usage and should be discouraged.
>>> 
>>> Hope that helps, let us know if you have any more questions :)
>>> 
>>> Cheers
>>> Jan
>>> --
>>> 
>> 
>> 


Mime
View raw message