couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Joseph Davis (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COUCHDB-1342) Asynchronous file writes
Date Fri, 18 Nov 2011 00:39:52 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152507#comment-13152507
] 

Paul Joseph Davis commented on COUCHDB-1342:
--------------------------------------------


@Damien

> However, there is another optimization coming where a raw erlang FD is used in a calling
process to avoid messaging overhead (another big performance improvement in certain long operations),
which will maybe make it necessary again. 

I'm not sure what you mean here. Something along the lines of a file:open call in the couch_db_updater
process (and couch_mrview_updater)? If so that's an interesting idea. Seems like we could
make couch_file handle that quite easily along the lines of how file handles #prim_file vs
#file (if I recall those record names correctly). This could also solve some of the fd duplication
if we only need an extra fd for views that are updating.

> The concern with doubling the # non-db file descriptors is a real one. How big of a concern
of this?

The thing is, I'm not certain how it'll behave. Hence why it concerns me. Is it a matter of
just making sure that ulimit is set sufficiently high? How high is sufficient? If I'm running
in production, and I upgrade to a version of CouchDB that has this patch, can I at least guestimate
how configs might need to change? Maybe I'm being overly paranoid and its not an issue. I
dunno. Hence why it concerns me.

> The 5th concern would definitely make code more complicated for callers

I agree. I should've prefaced that bit with a "I wonder if in the future there's a follow
up direction we can go". It only occurred as I was finishing that comment so I figured I'd
write it down.

> Your 3rd and 4th concerns aren't Apache user concerns, but can be easily addressed after
check-in.

I have no idea what you mean by "Apache user concerns" here. If you're referring to "no one
cares how the sausage is made so long as its faster" then I'm going to have to disagree. Strongly.
Saying that databases are complicated so we shouldn't concern ourselves with code quality
is just going to leave us with a source tree in an even worse state than it already is.

And I'd like to address this argument about progress and the desires of users. This patch
was submitted to JIRA yesterday. My initial review was up within 3.5h. This patch changes
how the file abstraction works. In a database. As far as I'm concerned development on this
started yesterday at 13:28 when Jan uploaded the patch to JIRA. If you wanted things to be
moving more quickly at this stage you should have been developing this on a branch in git
and asking for input from the community.

Secondly, while I understand that you're highly motivated to help users by improving performance,
what does that have to do with the conversation about the technical merits of this patch?
This sense of urgency that progress must be made so lets address the issues I brought up after
its in trunk is not a convincing argument. You could address my comments by spending thirty
minutes in an editor and resubmitting the patch. Instead you're asking me to clean this up
for you after its committed.

Thirdly, every time someone asks, "Can it wait till it's on trunk?", all I hear is, "Can I
ignore what you just said and commit this anyway?" If I point at something and say that its
broken its because I'm expecting the patch to change or an explanation of why I'm wrong. And
I'm fine being wrong. It happens quite often. But this pattern of submitting patches and asking
for all concerns to be addressed after the patch is in trunk is starting to get a bit annoying.
If we want to adjust our policies around CTR vs RTC for larger patches, that's fine. Perhaps
adding an edge branch in git that will accept all our bigger somewhat scary commits would
be beneficial. If we start doing automated package building then users could even pull bleeding
edge code to test. But I digress.

                
> Asynchronous file writes
> ------------------------
>
>                 Key: COUCHDB-1342
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1342
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>            Reporter: Jan Lehnardt
>             Fix For: 1.3
>
>         Attachments: COUCHDB-1342.patch
>
>
> This change updates the file module so that it can do
> asynchronous writes. Basically it replies immediately
> to process asking to write something to the file, with
> the position where the chunks will be written to the
> file, while a dedicated child process keeps collecting
> chunks and write them to the file (and batching them
> when possible). After issuing a series of write request
> to the file module, the caller can call its 'flush'
> function which will block the caller until all the
> chunks it requested to write are effectively written
> to the file.
> This maximizes the IO subsystem, as for example, while
> the updater is traversing and modifying the btrees and
> doing CPU bound tasks, the writes are happening in
> parallel.
> Originally described at http://s.apache.org/TVu
> Github Commit: https://github.com/fdmanana/couchdb/commit/e82a673f119b82dddf674ac2e6233cd78c123554

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message