incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thad Guidry <thadgui...@gmail.com>
Subject Re: View Performance (was Re: The 1.0 Thread)
Date Sun, 05 Jul 2009 23:42:50 GMT
And instead of just a different physical disk for to store the indexes,
consider using an SSD  for your indexes.  ZFS can use these for it's cache
as well.  Small writes can causing freezing when used for the OS layer
itself, but if just storing CouchDB indexes then you'll be fine.  I myself
have used them for Lucene indexes with great results.

-Thad

On Sun, Jul 5, 2009 at 5:03 PM, Chris Anderson <jchris@apache.org> wrote:

> On Sun, Jul 5, 2009 at 11:52 AM, Scott Shumaker<sshumaker@gmail.com>
> wrote:
> > Every document in our database is in a view.  We have a wide variety
> > of different documents - but none of them constitute the majority of
> > the docs in our database.  In a single design doc, we can't use view
> > filtering - which means our performance is far worse (not to mention
> > that we have nearly 100 views, so every view request will have to run
> > through 100 javascript functions - some of which are quite expensive -
> > and are used for offline (batch) processing only).
> >
>
> Perhaps you've got more than one application there. In that case you
> could split up your views into a small handful of design docs. The
> mechanics of the inter-process communication mean that grouping your
> views uses less i/o, so the more views you can cram into each design
> doc, the better, although offline batch stuff should be in it's own
> doc.
>
> Unless your writes are coming so fast the view engine can't possibly
> keep up, you might do well to use a cron job to query index generation
> periodically, so that users aren't faced with a lot of indexing to
> wait for. Putting the view indexes on a different physical disk will
> make a very big difference in overall performance.
>
> >
> > It may very well be that the erlang view engine will help - since it
> > will cut down on the JSON -> erlang serialization, not to mention have
> > a far more efficient transport protocol.
>
> I think Erlang views will make a big difference for you because of the
> size of your objects and the possibility to avoid serialization
> overhead. We've clocked them at 2-10x faster which makes a difference.
>
> > That said, here is almost certainly
> > also a far more efficient communication protocol for talking to
> > couchjs than just communicating over stdin and stdout - not to mention
> > some ways to avoid the JSON -> erlang cost.  :)
>
> Patches are definitely welcome.
>
> You could get more view performance by running CouchDB on a
> CouchDB-Lounge cluster.
>
> http://code.google.com/p/couchdb-lounge/
>
>
>
> >
> > On Sun, Jul 5, 2009 at 11:02 AM, Chris Anderson<jchris@apache.org>
> wrote:
> >> On Sat, Jul 4, 2009 at 3:26 PM, Scott Shumaker<sshumaker@gmail.com>
> wrote:
> >>> Ok - here's some more detailed stats:
> >>>
> >>> Note that this is couch-0.9.0 with hipe enabled and the filter patch,
> >>> on my macbook pro.
> >>>
> >>> ~53K db documents, ~1500 are type:restaurant
> >>>
> >>
> >> Use of the design doc filter for view performance should be considered
> >> a smell. Let me see if I understand the scenario:
> >>
> >> You have a few restaurant docs in a big database, and you've got views
> >> to find them.
> >>
> >> Do you have other views? They should be consolidated into a single
> >> design document when possible.
> >>
> >> Are there documents in your database that are not in views at all?
> >>
> >> If you have say 1500 restaurants and 50k log entries in the same
> >> database, use two databases. If you have 1500 restaurants and 1500
> >> coffee shops and 1500 bars then you should consolidate your views into
> >> one design doc. Once you've properly relaxed your problems should be
> >> less acute.
> >>
> >> Thanks for the numbers. We think getting an Erlang view engine
> >> installed will make a difference, maybe even with the couchjs stuff,
> >> as we get more concurrent.
> >>
> >> Chris
> >>
> >>
> >>> We tested using Brian's bork.rb:
> >>>
> >>> no filtering:
> >>>
> >>> bork.rb - returning no values = 68s
> >>> bork.rb - returning 5 values per map(doc) call = 200s
> >>> couchjs - returning no values = 93s
> >>> couchjs - one doc emitted per type:restaurant = 104s
> >>>
> >>> w/ filtering: (select ~1500 docs out of 53K)
> >>>
> >>> couchjs - returning no values = 8.9s
> >>> couchjs - one doc emitted per type:restaurant = 19s
> >>>
> >>>
> >>> Couple of notes:
> >>>
> >>> 53K docs apparently take 68s to be converted to JSON, and received by
> >>> the dummy server (with no docs emitted) - or about 780 docs/second.
> >>> couchjs is slower than bork.rb in this case (unsurprising -  bork.rb
> >>> not really parsing the data)
> >>> filtering on the couch side is an enormous win for our test case.
> >>>
> >>> K/V inserts - (5*53K in (200-68)s) = ~2000 per second
> >>>
> >>> This is a pretty big difference from Brian's results (8000/sec),
> >>> although we're dealing with many more docs, and without comparing
> >>> hardware specs, it's difficult to draw conclusions.
> >>>
> >>> On Sat, Jul 4, 2009 at 11:39 AM, Scott Shumaker<sshumaker@gmail.com>
> wrote:
> >>>> Compiling with HiPE didn't seem to make any difference in performance.
>  :(
> >>>>
> >>>> On Thu, Jul 2, 2009 at 4:17 PM, Scott Shumaker<sshumaker@gmail.com>
> wrote:
> >>>>> I'll try that out tomorrow and post the results here.
> >>>>>
> >>>>> On Thu, Jul 2, 2009 at 3:01 PM, Paul Davis<
> paul.joseph.davis@gmail.com> wrote:
> >>>>>> On Thu, Jul 2, 2009 at 5:50 PM, Scott Shumaker<sshumaker@gmail.com>
> wrote:
> >>>>>>> One question, though: Why are the emitted view results stored
as
> >>>>>>> erlang terms, as opposed to storing the JSON returned from
the view
> >>>>>>> server - which is what you'll be serving to the clients
anyway?
> >>>>>>>
> >>>>>>> If you skipped the reverse json->erlang encoding, and
additionally
> >>>>>>> stored a cached json copy of each document alongside the
document
> >>>>>>> whenever a document in couchdb was created/updated (which
you could
> >>>>>>> incrementally generate in a separate erlang process so you
don't
> have
> >>>>>>> to slow down write performance) - and just pass this json
copy to
> the
> >>>>>>> view, you could basically eliminate the json->erlang
conversion
> >>>>>>> overhead entirely (since it would only be done asynchronously).
> >>>>>>>
> >>>>>>> Even if you need to store the emitted view results back
into
> erlang,
> >>>>>>> you could have a special optimization case for emitting
(key, doc)
> -
> >>>>>>> because you already have the document as both erlang/json
(assuming
> >>>>>>> you were storing cached json copies).  And include_docs
would get
> >>>>>>> faster since you wouldn't need to do the json conversion
there
> either.
> >>>>>>>
> >>>>>>> Just a thought.
> >>>>>>>
> >>>>>>
> >>>>>> Premature optimization is the root of all evil? Have you tried
> >>>>>> compiling CouchDB with HiPE enabled. I'm inclined to agree with
you
> >>>>>> that the large JSON values are probably a significant cause
here.
> >>>>>> Assuming your Erlang is HiPE enabled you can do something like
this
> to
> >>>>>> compile CouchDB:
> >>>>>>
> >>>>>>    $ ./bootstrap
> >>>>>>    $ ERLC_FLAGS="+native +inline +inline_list_funcs" ./configure
> >>>>>>    $ make
> >>>>>>    $ sudo make install
> >>>>>>
> >>>>>>
> >>>>>>> Scott
> >>>>>>>
> >>>>>>> On Thu, Jul 2, 2009 at 2:42 PM, Scott Shumaker<sshumaker@gmail.com>
> wrote:
> >>>>>>>> I should mention that we tend to emit (doc._id, doc)
in our views
> - as
> >>>>>>>> opposed to doc._id, null and using include_docs - because
we found
> >>>>>>>> that doc._id,null gave us a 30% speedup on building
the views, but
> >>>>>>>> cost us about the same on each additional hit to the
view.
> >>>>>>>>
> >>>>>>>> Scott
> >>>>>>>>
> >>>>>>>> On Thu, Jul 2, 2009 at 2:15 PM, Scott Shumaker<
> sshumaker@gmail.com> wrote:
> >>>>>>>>> We see times that are considerably worse.  We mostly
have maps -
> very
> >>>>>>>>> few reduces.  We have 40k objects, about 25 design
docs, and 90
> views.
> >>>>>>>>>  Although we're about to change the code to auto-generate
the
> design
> >>>>>>>>> docs based on the view filters used (re: view filter
patch) - see
> if
> >>>>>>>>> that helps.
> >>>>>>>>>
> >>>>>>>>> Maybe it's because we have larger objects - but
re-indexing a
> typical
> >>>>>>>>> new view takes > 5 minutes (with view filtering
off).  Some are
> worse.
> >>>>>>>>>  With view filtering on some can be quite fast -
some views
> finish in
> >>>>>>>>> like 10 seconds.  Interestingly, reindexing all
views takes about
> an
> >>>>>>>>> hour - with or without view filtering.  I'm guessing
that a
> >>>>>>>>> substantial part of the bottleneck is erlang ->
json
> serialization.
> >>>>>>>>> Many of our objects are heavily nested structures
and exceed 10k
> in
> >>>>>>>>> size.  One other note - when we tried dropping in
the optimized
> >>>>>>>>> 'main.js' posted on the mailing list, we saw an
overall 20%
> speedup.
> >>>>>>>>> Unfortunately, it wasn't compatible with the authentication
> stuff, and
> >>>>>>>>> the deployment was a bit wacky, so we're holding
off on that
> right
> >>>>>>>>> now.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Thu, Jul 2, 2009 at 11:30 AM, Damien Katz<damien@apache.org>
> wrote:
> >>>>>>>>>>
> >>>>>>>>>> On Jul 2, 2009, at 1:55 PM, Paul Davis wrote:
> >>>>>>>>>>
> >>>>>>>>>>> On Thu, Jul 2, 2009 at 1:29 PM, Damien Katz<damien@apache.org>
> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Jul 2, 2009, at 1:16 PM, Jason Davies
wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On 2 Jul 2009, at 15:38, Brian Candler
wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> For some fruit that was so low-hanging
that I nearly stubbed
> my toe on
> >>>>>>>>>>>>>> it,
> >>>>>>>>>>>>>> see https://issues.apache.org/jira/browse/COUCHDB-399
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Nice work!  I'd be interested to
see what kind of performance
> increase
> >>>>>>>>>>>>> we
> >>>>>>>>>>>>> get from Spidermonkey 1.8.1, which
comes with native JSON
> >>>>>>>>>>>>> parsing/encoding.
> >>>>>>>>>>>>>  See here for details:
> >>>>>>>>>>>>> https://developer.mozilla.org/En/Using_native_JSON
.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Rumour has it 1.8.1 will be released
any time soon (TM)
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'm not sure the new engine is such
a no-brainer. One thing
> about the new
> >>>>>>>>>>>> generation of JS VMs is we've seen greatly
increased memory
> usage with
> >>>>>>>>>>>> earlier versions. Also the startup times
might be longer, or
> shorter.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Though I wonder if this can be improved
by forking a JS
> process rather
> >>>>>>>>>>>> than
> >>>>>>>>>>>> spawning a new process.
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Memory usage is a definite concern. I'm
not sure I follow why
> startup
> >>>>>>>>>>> times would be important though. Am I missing
something?
> >>>>>>>>>>
> >>>>>>>>>> Start up time isn't a huge concern, but it's
is a something to
> consider. On
> >>>>>>>>>> a heavily loaded system, scripts that normally
work might start
> to time out,
> >>>>>>>>>> requiring restarting the process. Lots of restarts
may start to
> eat lots cpu
> >>>>>>>>>> and memory IO.
> >>>>>>>>>>
> >>>>>>>>>> -Damien
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> -Damien
> >>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>> Jason Davies
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> www.jasondavies.com
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> Chris Anderson
> >> http://jchrisa.net
> >> http://couch.io
> >>
> >
>
>
>
> --
> Chris Anderson
> http://jchrisa.net
> http://couch.io
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message