couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Damien Katz (JIRA)" <j...@apache.org>
Subject [jira] Commented: (COUCHDB-568) When delayed_commits = true, keep updated btree nodes in memory until the commit
Date Wed, 11 Nov 2009 19:54:39 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776617#action_12776617
] 

Damien Katz commented on COUCHDB-568:
-------------------------------------

I'm not sure how much parallelizing the writes buys you much with an append only architecture.
Right now with complex keys, it's possible to make the CPU the bottleneck (as we saw with
the Raindrop Megaview problem), but typically the bottleneck is how fast we can stream the
writes to disk. I think it would be more fruitful to optimize the current architecture for
CPU and memory IO until we make the disk the bottleneck.

Another possible optimization would be to have a bunch of uncommitted actions that stay in
memory, and as you read-query the btrees, the actions are applied on the fly into the nodes
at the appropriate time. I've wanted to do this a long time, just never got around to it as
it's fairly complex and would need to be applied to most every btree operation. Reductions
would be trickiest.

But with this optimization its possible to do 2 things:
1. virtually insert/remove stuff into the btree but not to disk.
2. write these actions to disk but not to the btrees, batching them up for more efficient
btree inserts, and still committing them to disk. The batch amount would need to be tuned
for optimal performance.

> When delayed_commits = true, keep updated btree nodes in memory until the commit
> --------------------------------------------------------------------------------
>
>                 Key: COUCHDB-568
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-568
>             Project: CouchDB
>          Issue Type: Improvement
>    Affects Versions: 0.10
>            Reporter: Adam Kocoloski
>
> rnewson reported on IRC that the new batch=ok implementation results in significantly
larger overhead in the .couch files.  This makes sense; the old batch mode waited 1 second
before saving, but the new implementation just updates the doc asynchronously.  With fast
hardware and moderate write rates it's likely that each document is being written separately.
> The overhead presumably arises from frequently updated btree inner nodes being written
to disk many times over.  I'm interested in exploring a modification of the delayed_commits
mode whereby the updated btree nodes are not actually written to disk immediately, but are
instead held in memory until the commit.  I'd like to think that this will result in more
compact files without any decrease in durability.  New read requests would still be able to
access these in-memory nodes.
> I realize the notion that updates go directly to disk is baked pretty deeply into couch_btree,
but I still thought this was worth bringing up to a wider audience.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message