incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Anderson <jch...@apache.org>
Subject Re: View Performance (was Re: The 1.0 Thread)
Date Sun, 05 Jul 2009 18:02:05 GMT
On Sat, Jul 4, 2009 at 3:26 PM, Scott Shumaker<sshumaker@gmail.com> wrote:
> Ok - here's some more detailed stats:
>
> Note that this is couch-0.9.0 with hipe enabled and the filter patch,
> on my macbook pro.
>
> ~53K db documents, ~1500 are type:restaurant
>

Use of the design doc filter for view performance should be considered
a smell. Let me see if I understand the scenario:

You have a few restaurant docs in a big database, and you've got views
to find them.

Do you have other views? They should be consolidated into a single
design document when possible.

Are there documents in your database that are not in views at all?

If you have say 1500 restaurants and 50k log entries in the same
database, use two databases. If you have 1500 restaurants and 1500
coffee shops and 1500 bars then you should consolidate your views into
one design doc. Once you've properly relaxed your problems should be
less acute.

Thanks for the numbers. We think getting an Erlang view engine
installed will make a difference, maybe even with the couchjs stuff,
as we get more concurrent.

Chris


> We tested using Brian's bork.rb:
>
> no filtering:
>
> bork.rb - returning no values = 68s
> bork.rb - returning 5 values per map(doc) call = 200s
> couchjs - returning no values = 93s
> couchjs - one doc emitted per type:restaurant = 104s
>
> w/ filtering: (select ~1500 docs out of 53K)
>
> couchjs - returning no values = 8.9s
> couchjs - one doc emitted per type:restaurant = 19s
>
>
> Couple of notes:
>
> 53K docs apparently take 68s to be converted to JSON, and received by
> the dummy server (with no docs emitted) - or about 780 docs/second.
> couchjs is slower than bork.rb in this case (unsurprising -  bork.rb
> not really parsing the data)
> filtering on the couch side is an enormous win for our test case.
>
> K/V inserts - (5*53K in (200-68)s) = ~2000 per second
>
> This is a pretty big difference from Brian's results (8000/sec),
> although we're dealing with many more docs, and without comparing
> hardware specs, it's difficult to draw conclusions.
>
> On Sat, Jul 4, 2009 at 11:39 AM, Scott Shumaker<sshumaker@gmail.com> wrote:
>> Compiling with HiPE didn't seem to make any difference in performance.  :(
>>
>> On Thu, Jul 2, 2009 at 4:17 PM, Scott Shumaker<sshumaker@gmail.com> wrote:
>>> I'll try that out tomorrow and post the results here.
>>>
>>> On Thu, Jul 2, 2009 at 3:01 PM, Paul Davis<paul.joseph.davis@gmail.com>
wrote:
>>>> On Thu, Jul 2, 2009 at 5:50 PM, Scott Shumaker<sshumaker@gmail.com>
wrote:
>>>>> One question, though: Why are the emitted view results stored as
>>>>> erlang terms, as opposed to storing the JSON returned from the view
>>>>> server - which is what you'll be serving to the clients anyway?
>>>>>
>>>>> If you skipped the reverse json->erlang encoding, and additionally
>>>>> stored a cached json copy of each document alongside the document
>>>>> whenever a document in couchdb was created/updated (which you could
>>>>> incrementally generate in a separate erlang process so you don't have
>>>>> to slow down write performance) - and just pass this json copy to the
>>>>> view, you could basically eliminate the json->erlang conversion
>>>>> overhead entirely (since it would only be done asynchronously).
>>>>>
>>>>> Even if you need to store the emitted view results back into erlang,
>>>>> you could have a special optimization case for emitting (key, doc) -
>>>>> because you already have the document as both erlang/json (assuming
>>>>> you were storing cached json copies).  And include_docs would get
>>>>> faster since you wouldn't need to do the json conversion there either.
>>>>>
>>>>> Just a thought.
>>>>>
>>>>
>>>> Premature optimization is the root of all evil? Have you tried
>>>> compiling CouchDB with HiPE enabled. I'm inclined to agree with you
>>>> that the large JSON values are probably a significant cause here.
>>>> Assuming your Erlang is HiPE enabled you can do something like this to
>>>> compile CouchDB:
>>>>
>>>>    $ ./bootstrap
>>>>    $ ERLC_FLAGS="+native +inline +inline_list_funcs" ./configure
>>>>    $ make
>>>>    $ sudo make install
>>>>
>>>>
>>>>> Scott
>>>>>
>>>>> On Thu, Jul 2, 2009 at 2:42 PM, Scott Shumaker<sshumaker@gmail.com>
wrote:
>>>>>> I should mention that we tend to emit (doc._id, doc) in our views
- as
>>>>>> opposed to doc._id, null and using include_docs - because we found
>>>>>> that doc._id,null gave us a 30% speedup on building the views, but
>>>>>> cost us about the same on each additional hit to the view.
>>>>>>
>>>>>> Scott
>>>>>>
>>>>>> On Thu, Jul 2, 2009 at 2:15 PM, Scott Shumaker<sshumaker@gmail.com>
wrote:
>>>>>>> We see times that are considerably worse.  We mostly have maps
- very
>>>>>>> few reduces.  We have 40k objects, about 25 design docs, and
90 views.
>>>>>>>  Although we're about to change the code to auto-generate the
design
>>>>>>> docs based on the view filters used (re: view filter patch) -
see if
>>>>>>> that helps.
>>>>>>>
>>>>>>> Maybe it's because we have larger objects - but re-indexing a
typical
>>>>>>> new view takes > 5 minutes (with view filtering off).  Some
are worse.
>>>>>>>  With view filtering on some can be quite fast - some views
finish in
>>>>>>> like 10 seconds.  Interestingly, reindexing all views takes
about an
>>>>>>> hour - with or without view filtering.  I'm guessing that a
>>>>>>> substantial part of the bottleneck is erlang -> json serialization.
>>>>>>> Many of our objects are heavily nested structures and exceed
10k in
>>>>>>> size.  One other note - when we tried dropping in the optimized
>>>>>>> 'main.js' posted on the mailing list, we saw an overall 20% speedup.
>>>>>>> Unfortunately, it wasn't compatible with the authentication stuff,
and
>>>>>>> the deployment was a bit wacky, so we're holding off on that
right
>>>>>>> now.
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jul 2, 2009 at 11:30 AM, Damien Katz<damien@apache.org>
wrote:
>>>>>>>>
>>>>>>>> On Jul 2, 2009, at 1:55 PM, Paul Davis wrote:
>>>>>>>>
>>>>>>>>> On Thu, Jul 2, 2009 at 1:29 PM, Damien Katz<damien@apache.org>
wrote:
>>>>>>>>>>
>>>>>>>>>> On Jul 2, 2009, at 1:16 PM, Jason Davies wrote:
>>>>>>>>>>
>>>>>>>>>>> On 2 Jul 2009, at 15:38, Brian Candler wrote:
>>>>>>>>>>>
>>>>>>>>>>>> For some fruit that was so low-hanging that
I nearly stubbed my toe on
>>>>>>>>>>>> it,
>>>>>>>>>>>> see https://issues.apache.org/jira/browse/COUCHDB-399
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Nice work!  I'd be interested to see what kind
of performance increase
>>>>>>>>>>> we
>>>>>>>>>>> get from Spidermonkey 1.8.1, which comes with
native JSON
>>>>>>>>>>> parsing/encoding.
>>>>>>>>>>>  See here for details:
>>>>>>>>>>> https://developer.mozilla.org/En/Using_native_JSON .
>>>>>>>>>>>
>>>>>>>>>>> Rumour has it 1.8.1 will be released any time
soon (TM)
>>>>>>>>>>
>>>>>>>>>> I'm not sure the new engine is such a no-brainer.
One thing about the new
>>>>>>>>>> generation of JS VMs is we've seen greatly increased
memory usage with
>>>>>>>>>> earlier versions. Also the startup times might be
longer, or shorter.
>>>>>>>>>>
>>>>>>>>>> Though I wonder if this can be improved by forking
a JS process rather
>>>>>>>>>> than
>>>>>>>>>> spawning a new process.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Memory usage is a definite concern. I'm not sure I follow
why startup
>>>>>>>>> times would be important though. Am I missing something?
>>>>>>>>
>>>>>>>> Start up time isn't a huge concern, but it's is a something
to consider. On
>>>>>>>> a heavily loaded system, scripts that normally work might
start to time out,
>>>>>>>> requiring restarting the process. Lots of restarts may start
to eat lots cpu
>>>>>>>> and memory IO.
>>>>>>>>
>>>>>>>> -Damien
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -Damien
>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Jason Davies
>>>>>>>>>>>
>>>>>>>>>>> www.jasondavies.com
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>



-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Mime
View raw message