couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damien Katz <dam...@apache.org>
Subject Re: Tail Append Headers
Date Tue, 19 May 2009 16:50:29 GMT
As I think about it, I'm not surprised you aren't getting better  
numbers with delayed updates, which amortize the cost of fsync of all  
the docs being updated per second. But to get half the performance  
seems wrong. I'm hoping it's something easy to fix, we'll need to run  
a profiler to be sure.

I'd like to see benchmarks across a variety of loads, and also view  
build behavior too. For one thing, using full commits on individual  
doc updates, the new code should be much faster. I also think view  
refreshes could be slower or faster. Slower because the docs they are  
mapping are more sparse on disk, but faster because it requires no  
fsync (if you are using a filesystem that guarantees ordered  
sequential writes).

Also, if performance generally turns out to be all around slower,  
we'll have to discuss if the pure tail append change is actually worth  
it. Maybe we can tail append headers with the old design too, but they  
are only ever used when the front header is bad. The only problem is,  
without implementing the current design, I don't know of a workable  
way to find an valid header vs something that happens to look like a  
couchdb file header, such as a couchdb file attached inside a document  
in a live db, or an intentional attack.

-Damien

On May 18, 2009, at 7:43 PM, Chris Anderson wrote:

> On Mon, May 18, 2009 at 10:59 AM, Damien Katz <damien@apache.org>  
> wrote:
>> Feedback on all this welcome. Please try out the branch to shake  
>> out any
>> bugs or performance problems that might be lurking.
>>
>
> The code looks simpler, which is a nice surprise considering the
> storage is actually more robust.
>
> Here are comparative benchmarks on my MacBook. Two runs of
> hovercraft:lightning() which factors out all http / json overhead, and
> inserts small documents in batches of 1000. I've also done a round of
> running my curl/bash benchmark script to insert 100k docs (with
> sequential ids)
>
> append only:
> 2> hovercraft:lightning().
> Inserted 100000 docs in 27.614173 seconds with batch size of 1000.
> (3621.328800974775 docs/sec)
> 3> hovercraft:lightning().
> Inserted 100000 docs in 27.508795 seconds with batch size of 1000.
> (3635.201032978726 docs/sec)
>
> curl/bash: 2285.7 docs/sec
>
> trunk:
> 2> hovercraft:lightning().
> Inserted 100000 docs in 13.237762 seconds with batch size of 1000.
> (7554.146992520337 docs/sec)
> 3> hovercraft:lightning().
> Inserted 100000 docs in 13.032335 seconds with batch size of 1000.
> (7673.222028132334 docs/sec)
>
> curl/bash: 3417.6 docs/sec
>
> So the preliminary results are that the append-only (on my particular
> hardware with a contrived micro-benchmark) is about twice as slow.
>
> It's a matter of priorities. Do we want absolute robustness, or do we
> want more performance? Also, the append-only stuff is brand-new and
> could conceivably be optimized. I would not be surprised at all to see
> it get faster than trunk, with enough tuning.
>
> Chris
>
> -- 
> Chris Anderson
> http://jchrisa.net
> http://couch.io


Mime
View raw message