couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Shumaker <sshuma...@gmail.com>
Subject Re: View Performance (was Re: The 1.0 Thread)
Date Sat, 04 Jul 2009 18:39:39 GMT
Compiling with HiPE didn't seem to make any difference in performance.  :(

On Thu, Jul 2, 2009 at 4:17 PM, Scott Shumaker<sshumaker@gmail.com> wrote:
> I'll try that out tomorrow and post the results here.
>
> On Thu, Jul 2, 2009 at 3:01 PM, Paul Davis<paul.joseph.davis@gmail.com> wrote:
>> On Thu, Jul 2, 2009 at 5:50 PM, Scott Shumaker<sshumaker@gmail.com> wrote:
>>> One question, though: Why are the emitted view results stored as
>>> erlang terms, as opposed to storing the JSON returned from the view
>>> server - which is what you'll be serving to the clients anyway?
>>>
>>> If you skipped the reverse json->erlang encoding, and additionally
>>> stored a cached json copy of each document alongside the document
>>> whenever a document in couchdb was created/updated (which you could
>>> incrementally generate in a separate erlang process so you don't have
>>> to slow down write performance) - and just pass this json copy to the
>>> view, you could basically eliminate the json->erlang conversion
>>> overhead entirely (since it would only be done asynchronously).
>>>
>>> Even if you need to store the emitted view results back into erlang,
>>> you could have a special optimization case for emitting (key, doc) -
>>> because you already have the document as both erlang/json (assuming
>>> you were storing cached json copies).  And include_docs would get
>>> faster since you wouldn't need to do the json conversion there either.
>>>
>>> Just a thought.
>>>
>>
>> Premature optimization is the root of all evil? Have you tried
>> compiling CouchDB with HiPE enabled. I'm inclined to agree with you
>> that the large JSON values are probably a significant cause here.
>> Assuming your Erlang is HiPE enabled you can do something like this to
>> compile CouchDB:
>>
>>    $ ./bootstrap
>>    $ ERLC_FLAGS="+native +inline +inline_list_funcs" ./configure
>>    $ make
>>    $ sudo make install
>>
>>
>>> Scott
>>>
>>> On Thu, Jul 2, 2009 at 2:42 PM, Scott Shumaker<sshumaker@gmail.com> wrote:
>>>> I should mention that we tend to emit (doc._id, doc) in our views - as
>>>> opposed to doc._id, null and using include_docs - because we found
>>>> that doc._id,null gave us a 30% speedup on building the views, but
>>>> cost us about the same on each additional hit to the view.
>>>>
>>>> Scott
>>>>
>>>> On Thu, Jul 2, 2009 at 2:15 PM, Scott Shumaker<sshumaker@gmail.com>
wrote:
>>>>> We see times that are considerably worse.  We mostly have maps - very
>>>>> few reduces.  We have 40k objects, about 25 design docs, and 90 views.
>>>>>  Although we're about to change the code to auto-generate the design
>>>>> docs based on the view filters used (re: view filter patch) - see if
>>>>> that helps.
>>>>>
>>>>> Maybe it's because we have larger objects - but re-indexing a typical
>>>>> new view takes > 5 minutes (with view filtering off).  Some are worse.
>>>>>  With view filtering on some can be quite fast - some views finish in
>>>>> like 10 seconds.  Interestingly, reindexing all views takes about an
>>>>> hour - with or without view filtering.  I'm guessing that a
>>>>> substantial part of the bottleneck is erlang -> json serialization.
>>>>> Many of our objects are heavily nested structures and exceed 10k in
>>>>> size.  One other note - when we tried dropping in the optimized
>>>>> 'main.js' posted on the mailing list, we saw an overall 20% speedup.
>>>>> Unfortunately, it wasn't compatible with the authentication stuff, and
>>>>> the deployment was a bit wacky, so we're holding off on that right
>>>>> now.
>>>>>
>>>>>
>>>>> On Thu, Jul 2, 2009 at 11:30 AM, Damien Katz<damien@apache.org>
wrote:
>>>>>>
>>>>>> On Jul 2, 2009, at 1:55 PM, Paul Davis wrote:
>>>>>>
>>>>>>> On Thu, Jul 2, 2009 at 1:29 PM, Damien Katz<damien@apache.org>
wrote:
>>>>>>>>
>>>>>>>> On Jul 2, 2009, at 1:16 PM, Jason Davies wrote:
>>>>>>>>
>>>>>>>>> On 2 Jul 2009, at 15:38, Brian Candler wrote:
>>>>>>>>>
>>>>>>>>>> For some fruit that was so low-hanging that I nearly
stubbed my toe on
>>>>>>>>>> it,
>>>>>>>>>> see https://issues.apache.org/jira/browse/COUCHDB-399
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Nice work!  I'd be interested to see what kind of performance
increase
>>>>>>>>> we
>>>>>>>>> get from Spidermonkey 1.8.1, which comes with native
JSON
>>>>>>>>> parsing/encoding.
>>>>>>>>>  See here for details:
>>>>>>>>> https://developer.mozilla.org/En/Using_native_JSON .
>>>>>>>>>
>>>>>>>>> Rumour has it 1.8.1 will be released any time soon (TM)
>>>>>>>>
>>>>>>>> I'm not sure the new engine is such a no-brainer. One thing
about the new
>>>>>>>> generation of JS VMs is we've seen greatly increased memory
usage with
>>>>>>>> earlier versions. Also the startup times might be longer,
or shorter.
>>>>>>>>
>>>>>>>> Though I wonder if this can be improved by forking a JS process
rather
>>>>>>>> than
>>>>>>>> spawning a new process.
>>>>>>>>
>>>>>>>
>>>>>>> Memory usage is a definite concern. I'm not sure I follow why
startup
>>>>>>> times would be important though. Am I missing something?
>>>>>>
>>>>>> Start up time isn't a huge concern, but it's is a something to consider.
On
>>>>>> a heavily loaded system, scripts that normally work might start to
time out,
>>>>>> requiring restarting the process. Lots of restarts may start to eat
lots cpu
>>>>>> and memory IO.
>>>>>>
>>>>>> -Damien
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>> -Damien
>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Jason Davies
>>>>>>>>>
>>>>>>>>> www.jasondavies.com
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message