couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Filipe Manana (JIRA)" <j...@apache.org>
Subject [jira] Commented: (COUCHDB-1023) Batching writes of BTree nodes (when possible) and in the DB updater
Date Wed, 12 Jan 2011 02:08:45 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980504#action_12980504
] 

Filipe Manana commented on COUCHDB-1023:
----------------------------------------

Hi Randall, no I wasn't aware of you're experiment.

Just quick looking at it, the main difference seems that yours does an extra map/fold to each
key tree and then maps each document to the respective summary.

As for the term_to_binary before a gen_server call, I don't think it offers any gain. Do you
or anyone knows exactly what is more expensive: converting a term to a binary or copying a
term?

And I don't think the complexity of adding a write-through cache is worth it: more code, more
one server, and a new bottle neck possibly. For that I would use the delayed_writes option
of the Erlang's file module.
But, i might be wrong. Some concrete implementation and benchmarks would definitely change
my mind :)



> Batching writes of BTree nodes (when possible) and in the DB updater
> --------------------------------------------------------------------
>
>                 Key: COUCHDB-1023
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1023
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>            Reporter: Filipe Manana
>
> Recently I started experimenting with batching writes in the DB updater.
> For a test of 100 writers of 1Kb documents for e.g., most often the updater collects
between 20 and 30 documents to write.
> Currently it does a file:write operation for each one. Not only this is slower, but it
implies more context switches and stressing the OS/filesystem by allocating few blocks very
often (since we use a pure file append write mode). The same can be done in the BTree node
writes.
> The following branch/patch, is an experiment of batching writes:
> https://github.com/fdmanana/couchdb/compare/batch_writes
> In couch_file there's a quick test method that compares the time taken to write X blocks
of size Y versus writing a single block of size X * Y.
> Example:
> Eshell V5.8.2  (abort with ^G)
> 1> Apache CouchDB 1.2.0aa777195-git (LogLevel=info) is starting.
> Apache CouchDB has started. Time to relax.
> [info] [<0.37.0>] Apache CouchDB has started on http://127.0.0.1:5984/
> 1> couch_file:test(1000, 30).
> multi writes of 30 binaries, each of size 1000 bytes, took 1920us
> batch write of 30 binaries, each of size 1000 bytes,  took 344us
> ok
> 2> 
> 2> couch_file:test(4000, 30).
> multi writes of 30 binaries, each of size 4000 bytes, took 2002us
> batch write of 30 binaries, each of size 4000 bytes,  took 700us
> ok
> 3> 
> One order of magnitude less is quite significant I would say.
> Lower response times are mostly noticeable when delayed_commits are set to true.
> Running a writes only test with this branch gave me:
> http://graphs.mikeal.couchone.com/#/graph/8bf31813eef7c0b7e37d1ea25902e544
> While with trunk I got:
> http://graphs.mikeal.couchone.com/#/graph/8bf31813eef7c0b7e37d1ea25902eb50
> These tests were done on Linux with ext4 (and OTP R14B01).
> However I'm still not 100% sure if this worth applying to trunk.
> Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message