incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Torstein Krause Johansen <torsteinkrausew...@gmail.com>
Subject Re: Complex queries & results
Date Wed, 08 Jun 2011 04:49:02 GMT
On 07/06/11 14:21, Torstein Krause Johansen wrote:

> On 06/06/11 22:09, Sean Copenhaver wrote:

>> https://gist.github.com/1010318
>>
>> I tried this out with 10 docs fitting your example structure and with a
>> plain query (no grouping, no filtering, reduce on) I get back:
>>
>> { John: 4, Jane: 6 }
>
> Looks spot on! Thank you _so_ much for doing this.
>
> I'm really curious how this performs, I will be-siege my couch with bulk
> updates, giving it a big-ish data set while simultaneously be-siege it
> with reads GETs querying this map/reduce you've created. Will be very
> interesting.

I started by using siege to post 1000s of documents with 14 fields &
values (the actual data my application will be using) and let it run
till I got a fair data set. After reducing the now ~710,000 document big 
DB from 4.2GB to ~360MB, the queries went from ~8s to ~0.05s. Fantastic.

I then unleashed siege again (100 parallel threads this time, creating
200 new documents each using the bulk endpoint (siege somehow didn't
want to work with my initial 1000 document big .json file, so I had to
reduce it to 200 to make siege not choke on it)) and wget (creating 
random data, using the normal document endpoint), the queries 
immediately started to climb upwards, 1s, 2s, 3s ... 80s and with no 
sign of stopping.

To see if it was the simultaneous write and read that were causing the
longer query times, I stopped siege and wget on my test machine
(different host, going through the same network switch).

Since there had been quite a number of new documents, couch started
its checkpoint view updating leaving my couch unable to respond to any
queries for around 90s.

The query times then dropped down, stabilising on 0.06 to 0.08s when
querying the DB with now ~800,000 documents and result sets containing 
~50 keys with ~2000 counts each. Great!

The climbing query times when doing so many updates is not a real
concern for me as I'll put a queue in front of couch which buffers up
the incoming write requests and fires up a bulk update every 30
seconds or so. Couch seems more than fast enough write-wise as long as
the documents are provided in bulks.

What does worry me, though, is that couch doesn't answer any query
while it's doing its view updates. Even with a nice cache server in
front which can serve old content till couch is finished updating its 
views, I still find it a bit unsettling. Do you have any tips for me here?

Cheers,

-Torstein

Mime
View raw message