couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Shumaker <sshuma...@gmail.com>
Subject Re: View Performance (was Re: The 1.0 Thread)
Date Thu, 02 Jul 2009 21:50:30 GMT
One question, though: Why are the emitted view results stored as
erlang terms, as opposed to storing the JSON returned from the view
server - which is what you'll be serving to the clients anyway?

If you skipped the reverse json->erlang encoding, and additionally
stored a cached json copy of each document alongside the document
whenever a document in couchdb was created/updated (which you could
incrementally generate in a separate erlang process so you don't have
to slow down write performance) - and just pass this json copy to the
view, you could basically eliminate the json->erlang conversion
overhead entirely (since it would only be done asynchronously).

Even if you need to store the emitted view results back into erlang,
you could have a special optimization case for emitting (key, doc) -
because you already have the document as both erlang/json (assuming
you were storing cached json copies).  And include_docs would get
faster since you wouldn't need to do the json conversion there either.

Just a thought.

Scott

On Thu, Jul 2, 2009 at 2:42 PM, Scott Shumaker<sshumaker@gmail.com> wrote:
> I should mention that we tend to emit (doc._id, doc) in our views - as
> opposed to doc._id, null and using include_docs - because we found
> that doc._id,null gave us a 30% speedup on building the views, but
> cost us about the same on each additional hit to the view.
>
> Scott
>
> On Thu, Jul 2, 2009 at 2:15 PM, Scott Shumaker<sshumaker@gmail.com> wrote:
>> We see times that are considerably worse.  We mostly have maps - very
>> few reduces.  We have 40k objects, about 25 design docs, and 90 views.
>>  Although we're about to change the code to auto-generate the design
>> docs based on the view filters used (re: view filter patch) - see if
>> that helps.
>>
>> Maybe it's because we have larger objects - but re-indexing a typical
>> new view takes > 5 minutes (with view filtering off).  Some are worse.
>>  With view filtering on some can be quite fast - some views finish in
>> like 10 seconds.  Interestingly, reindexing all views takes about an
>> hour - with or without view filtering.  I'm guessing that a
>> substantial part of the bottleneck is erlang -> json serialization.
>> Many of our objects are heavily nested structures and exceed 10k in
>> size.  One other note - when we tried dropping in the optimized
>> 'main.js' posted on the mailing list, we saw an overall 20% speedup.
>> Unfortunately, it wasn't compatible with the authentication stuff, and
>> the deployment was a bit wacky, so we're holding off on that right
>> now.
>>
>>
>> On Thu, Jul 2, 2009 at 11:30 AM, Damien Katz<damien@apache.org> wrote:
>>>
>>> On Jul 2, 2009, at 1:55 PM, Paul Davis wrote:
>>>
>>>> On Thu, Jul 2, 2009 at 1:29 PM, Damien Katz<damien@apache.org> wrote:
>>>>>
>>>>> On Jul 2, 2009, at 1:16 PM, Jason Davies wrote:
>>>>>
>>>>>> On 2 Jul 2009, at 15:38, Brian Candler wrote:
>>>>>>
>>>>>>> For some fruit that was so low-hanging that I nearly stubbed
my toe on
>>>>>>> it,
>>>>>>> see https://issues.apache.org/jira/browse/COUCHDB-399
>>>>>>
>>>>>>
>>>>>> Nice work!  I'd be interested to see what kind of performance increase
>>>>>> we
>>>>>> get from Spidermonkey 1.8.1, which comes with native JSON
>>>>>> parsing/encoding.
>>>>>>  See here for details:
>>>>>> https://developer.mozilla.org/En/Using_native_JSON .
>>>>>>
>>>>>> Rumour has it 1.8.1 will be released any time soon (TM)
>>>>>
>>>>> I'm not sure the new engine is such a no-brainer. One thing about the
new
>>>>> generation of JS VMs is we've seen greatly increased memory usage with
>>>>> earlier versions. Also the startup times might be longer, or shorter.
>>>>>
>>>>> Though I wonder if this can be improved by forking a JS process rather
>>>>> than
>>>>> spawning a new process.
>>>>>
>>>>
>>>> Memory usage is a definite concern. I'm not sure I follow why startup
>>>> times would be important though. Am I missing something?
>>>
>>> Start up time isn't a huge concern, but it's is a something to consider. On
>>> a heavily loaded system, scripts that normally work might start to time out,
>>> requiring restarting the process. Lots of restarts may start to eat lots cpu
>>> and memory IO.
>>>
>>> -Damien
>>>
>>>
>>>>
>>>>> -Damien
>>>>>
>>>>>> --
>>>>>> Jason Davies
>>>>>>
>>>>>> www.jasondavies.com
>>>>>>
>>>>>
>>>>>
>>>
>>>
>>
>

Mime
View raw message