incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Goodall <matt.good...@gmail.com>
Subject Re: Performance issue with changes API
Date Wed, 12 Oct 2011 16:21:46 GMT
On 12 October 2011 16:33, Robert Newson <rnewson@apache.org> wrote:
> ooh, my math is way off, ignore ;)

No, I think you were right, assuming your 2000 rows/s calculation was
for 3.5M in 30mins.

>
> On 12 October 2011 16:32, Robert Newson <rnewson@apache.org> wrote:
>> The 3.5M row response is not formed in memory. :) It's done line by line.

Yeah, you're right, sorry for the misinformation. I misread the code
when I quickly scanned it :/. It clearly streams the response, even
for feed=normal.

>>
>> that said, that's almost 2000 rows per second, which doesn't sound
>> that bad to me.

I'm seeing over 20000 rows/s from a CouchDB on localhost running on
Linux. That's quite a difference, and I wouldn't describe my laptop as
especially fast either.

- Matt


>>
>> B.
>>
>> On 12 October 2011 16:26, Matt Goodall <matt.goodall@gmail.com> wrote:
>>> On 12 October 2011 14:22, Arnaud Bailly <arnaud.oqube@gmail.com> wrote:
>>>> Hello,
>>>> We have started experimenting with CouchDb as our backend, being especially
>>>> interested with the changes API, and we ran into performances issues.
>>>> We have a DB containing aournd 3.5M docs, each about 10K in size. Running
>>>> the following query on the database :
>>>>
>>>> http://192.168.1.166:5984/infowarehouse/_changes?since=0
>>>>
>>>> takes about 30minutes on a 4-core, Windows 7 box, which seems rather high.
>>>>
>>>> Is this expected ?  Are there any bench available on this API ?
>>>
>>> I'm not too surprised - CouchDB is probably building a massive JSON
>>> changes response containing 3.5M items ;-). Instead you should use the
>>> since=<start> and limit=<batch-size> args together to get the items
in
>>> sensibly-sized batches, ending when you see no more items in the
>>> response.
>>>
>>> Alternatively, you might be able to use feed=continuous with timeout=0
>>> to stream the changes as fast as possible. The timeout=0 arg is just
>>> there to shutdown the changes feed as soon as you've seen everything.
>>> My laptop takes about 50s to stream about 1M changes using this
>>> technique (sending the output to /dev/null).
>>>
>>> - Matt
>>>
>>
>

Mime
View raw message