couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jan Lehnardt (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COUCHDB-1342) Asynchronous file writes
Date Wed, 16 Nov 2011 18:26:51 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151401#comment-13151401
] 

Jan Lehnardt commented on COUCHDB-1342:
---------------------------------------

Good review Paul, thanks! :)

Let me quote and reply inline to each of your points:

> This patch looks like its a mishmash of a couple patches that are applied slightly differently.
Not sure if that's a branch thing or what. But for instance, there's a lot of id_btree(Db)
-> Db#db.couch type changes.

These are deliberate. Currently we are maintaining an Fd/couch_file pair for reads and writes
in a bit of a clumsy way in the #btree.fd record and the couch_file module and do a switcheroo
for writes vs. reads. Now all of that is moved into couch_file and couch_db doesn't have to
concern itself with the details of efficient file access.

> There's also a switch between couch_file:sync(Filepath) and couch_file:sync(Fd). I'm
not sure if that's on purpose or not because we switched to Filepath on purpose. I see it
as being possible but I don't see a reference to it.

We used to pass in the filepath to allow the fsync() to be async so it wouldn't block any
readers or writers. That behaviour is now obsolete as couch_file will now pass the fsync()
request to it's writer child process.


> I'm really not a fan of exposing the inner workings of couch_file to every client as
can be seen by the massive number of flush calls that this creates. I find it quite likely
that we'll eventually end up missing one of these flush calls and causing a nasty bug because
the error would look like data hadn't been written. I'd want to see things rewritten so that
the usage of couch_file doesn't change. Easiest solution I see would be to automatically flush
buffers if a read request comes in for something that's not on disk.

Good point, API-wise this is a bit leaky and we should definitely look into making this better,
but I don't think this blocks the bulk of the improvements that this patch introduces. I'm
happy to open a new ticket about this. Good starting points are calling flush() inside couch_file
1) after writing a header and 2) if a read fails with {error, eof} (fwiw, Damien tried the
latter before but removed it again, the details elude me).

> I'm pretty sure we talked about this at one point, but can someone describe the differences
between this and Erlang's delayed_write mode for files?

The difference is that delayed_write does buffering whereas we just want to keep writing to
the file all the time. delayed_write would not write until the buffer is full or the timeout
is hit. In addition, we wouldn't be able to tell for delayed_commits=false writes when data
hit the file reliably.

> couch_file becomes much more complicated with this patch. I'd prefer to see it simplified
if at all possible. I'm not entirely certain how it'd look but I might start by making a couch_file_reader
and couch_file_writer gen_servers instead of having bare processes in couch_file.

The whole point of this patch is to make couch_file smarter and move the reader/writer abstraction
further down the stack and reap the associated performance benefits. Unless we have an alternate
patch, we can't really compare this.

> I have to say that this patch scares me a bit. If we make the switch to something like
this then we're opening up a whole new type of failure scenario where file "writes" appear
to succeed but then end up failing after the caller has moved on. While there are certainly
cases where I can see this being a fine tradeoff (view updaters, compactors, etc) it worries
me a bit for the main database file. The fact is that I can't (yet) reasonably consider all
of the possible new failure modes and convince myself that things are correct. I've already
seen some super weird reactions to things like running out of disk space, it seems like not
knowing about such an error until you've "written" a megabyte of data seems awkward.

This doesn't change the error scenarios. We already rely on monitoring to tell the request
process a couch_file write didn't work: 

  https://github.com/fdmanana/couchdb/blob/master/src/couchdb/couch_db.erl#L819 ff. (825 in
particular)
  
   
                
> Asynchronous file writes
> ------------------------
>
>                 Key: COUCHDB-1342
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1342
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>            Reporter: Jan Lehnardt
>             Fix For: 1.3
>
>         Attachments: COUCHDB-1342.patch
>
>
> This change updates the file module so that it can do
> asynchronous writes. Basically it replies immediately
> to process asking to write something to the file, with
> the position where the chunks will be written to the
> file, while a dedicated child process keeps collecting
> chunks and write them to the file (and batching them
> when possible). After issuing a series of write request
> to the file module, the caller can call its 'flush'
> function which will block the caller until all the
> chunks it requested to write are effectively written
> to the file.
> This maximizes the IO subsystem, as for example, while
> the updater is traversing and modifying the btrees and
> doing CPU bound tasks, the writes are happening in
> parallel.
> Originally described at http://s.apache.org/TVu
> Github Commit: https://github.com/fdmanana/couchdb/commit/e82a673f119b82dddf674ac2e6233cd78c123554

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message