couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Damien Katz (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COUCHDB-1342) Asynchronous file writes
Date Thu, 17 Nov 2011 18:49:53 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152222#comment-13152222
] 

Damien Katz commented on COUCHDB-1342:
--------------------------------------

I don't mean to imply that Paul, or any committer isn't smart enough to handle a flush call.
I _know_ Paul is has the smarts and talent to deal with much more complexity. What I am saying
is that if a flush call requirement makes it so that someone can't work on the internals of
CouchDB, then they aren't suited for core database development. Database engines are complex
beasts.

Paul's point is about that the flush call can maybe be gotten rid of seems right. Originally,
we didn't have the code that prevented the write queue getting overwhelmed, because in our
product it's not possible. But I added it to make the rest of the enhancements suitable for
Apache, and now it seems it could be used to prevent the reads of unflushed data. However,
there is another optimization coming where a raw erlang FD is used in a calling process to
avoid messaging overhead (another big performance improvement in certain long operations),
which will maybe make it necessary again. We can remove it in the meantime, but it may need
to be added back in the future.

The concern with doubling the # non-db file descriptors is a real one. How big of a concern
of this? Do you have ideas how to fix this? Can we address this post check-in?

Your 3rd and 4th concerns aren't Apache user concerns, but can be easily addressed after check-in.
I have no objections, but I would prefer we have a culture of small changes/environment specific
changes like that happening after checkin. That will increase the rate the of progress on
the project in general. If you agree, would you be willing to add those changes post check-in?

The 5th concern would definitely make code more complicated for callers, and would involved
them batching usually a non-optimal amount of data. This code makes the batching automatic
and parallelize the writes, retiring batched data as fast as it can, and prevented the batching
of too much data.
                
> Asynchronous file writes
> ------------------------
>
>                 Key: COUCHDB-1342
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1342
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>            Reporter: Jan Lehnardt
>             Fix For: 1.3
>
>         Attachments: COUCHDB-1342.patch
>
>
> This change updates the file module so that it can do
> asynchronous writes. Basically it replies immediately
> to process asking to write something to the file, with
> the position where the chunks will be written to the
> file, while a dedicated child process keeps collecting
> chunks and write them to the file (and batching them
> when possible). After issuing a series of write request
> to the file module, the caller can call its 'flush'
> function which will block the caller until all the
> chunks it requested to write are effectively written
> to the file.
> This maximizes the IO subsystem, as for example, while
> the updater is traversing and modifying the btrees and
> doing CPU bound tasks, the writes are happening in
> parallel.
> Originally described at http://s.apache.org/TVu
> Github Commit: https://github.com/fdmanana/couchdb/commit/e82a673f119b82dddf674ac2e6233cd78c123554

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message