couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Dionne <dio...@dionne-associates.com>
Subject Re: [jira] [Commented] (COUCHDB-1342) Asynchronous file writes
Date Fri, 18 Nov 2011 11:45:57 GMT

On Nov 17, 2011, at 10:06 PM, Damien Katz (Commented) (JIRA) wrote:

> 
>    [ https://issues.apache.org/jira/browse/COUCHDB-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152608#comment-13152608
] 
> 
> Damien Katz commented on COUCHDB-1342:
> --------------------------------------
> 
> Paul, what I mean by "Apache users concerns" is that #3 isn't something that vanilla
Apache CouchDB users deal with, but third parties who modify the code or embed in interesting
way might (I suppose Cloudant has to deal with this). Perhaps I'm mistaken about that. I do
think patches should only be concerned with the vanilla use cases in order to be considered
check-in quality.
> 
> #4 is a style issue, not a correctness issue, or at least you haven't made a case that
it's a correctness issues. I have no problems with you changing it to a style you prefer,
but we should not expect that submitters of patches conform to an undocumented style.
> 
> There is no urgency around this patch, at Couchbase we can keep adding performance enhancements
and drift our codebase further and further from Apache. I don't want to see that happen, but
it only hurts the Apache project.

Damien,

I agree with both these points, your codebase at Couchbase is drifting but you're not alone
in that, we do need a culture where more correct fast code is checked in. I've only had a
couple of days to look at this and I've not had the time to read your Couchbase work. As I
look at this patch almost every concern Paul is raising is technically valid. We do have to
consider more than the vanilla CouchDB as it gets embedded in BigCouch for example, and CouchDB
is designed to be distributed, right? I first ran a simple test, adding 10K empty docs, and
notice a 40K difference in the db file size. Probably harmless, but I don't know why. There's
no real way to independently verify if this patch changes the db layout other than via the
semantics of the code.

Databases are hard, as you mention, very hard. Without good performance they are next to useless,
but a lack correctness is also problematic, certainly in some domains. I share other's frustration
with patches languishing. The patches to date I've submitted have all been small and have
often had to be refactored as the code migrated away (I think I have 3 now, 2 of them bugs).
COUCHDB-911 for example is a real bug, involving both couch_db and couch_db_updater, and as
Adam notes is not just a bulk docs issue. It reports a conflict but adds data to the db anyway.
Can you believe that? I tried a couple of fixes to minimize the surface area touched but there
was no real way to solve it correctly without adding to the data structures. When I saw this
patch my first reaction was wow, but now I'll have to rework 911 again as your patch also
touches the same files. It's totally orthogonal so no big deal.

I mention this only to point out that the review process is awesome and when taken seriously
makes for a better result. This isn't just people's pet concerns. It takes time to do this.
Fortunately it's not rocket science, it's just databases. The solution to the culture problem
is "best practices". Best practices have to be practiced, and someone (Jan as the project
lead I'm looking at you :) needs to crack the whip and set the tone. Of course I'm assuming
that we're talking about a process to produce "production" quality code. I quote production
as that phrase has evolved considerably over the years. If master is deemed acceptable for
prototypes, proofs of concept, etc. then fine but otherwise I'd suggest we follow Randall's
lead and work this patch on a branch first. Anyway, 'm sure you know these things, I don't
mean to prattle on. 

Best Regards,

Bob

> And I do see we have some culture problems in the Apache project. We need a culture where
useful, correct, fast code is verified and checked in, and then is improved incrementally.
Right now we have a culture of everyone's pet concerns must addressed before code gets checked
in, which is demoralizing and slows things down, which is a very big problem the project has
right now. I want your help in trying to change that.
> 
>> Asynchronous file writes
>> ------------------------
>> 
>>                Key: COUCHDB-1342
>>                URL: https://issues.apache.org/jira/browse/COUCHDB-1342
>>            Project: CouchDB
>>         Issue Type: Improvement
>>         Components: Database Core
>>           Reporter: Jan Lehnardt
>>            Fix For: 1.3
>> 
>>        Attachments: COUCHDB-1342.patch
>> 
>> 
>> This change updates the file module so that it can do
>> asynchronous writes. Basically it replies immediately
>> to process asking to write something to the file, with
>> the position where the chunks will be written to the
>> file, while a dedicated child process keeps collecting
>> chunks and write them to the file (and batching them
>> when possible). After issuing a series of write request
>> to the file module, the caller can call its 'flush'
>> function which will block the caller until all the
>> chunks it requested to write are effectively written
>> to the file.
>> This maximizes the IO subsystem, as for example, while
>> the updater is traversing and modifying the btrees and
>> doing CPU bound tasks, the writes are happening in
>> parallel.
>> Originally described at http://s.apache.org/TVu
>> Github Commit: https://github.com/fdmanana/couchdb/commit/e82a673f119b82dddf674ac2e6233cd78c123554
> 
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
> 
> 


Mime
View raw message