lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: Realtime & distributed
Date Fri, 09 Oct 2009 19:37:18 GMT
Jason,

  We've been running some perf/load/stress tests lately, but on a suggestion

from Ted Dunning, I've been trying to come up with a more "realistic" set of
stress
tests and indexing rates to see where NRT performs well and where it does
not,
instead of just indexing at maximum rate, looping over all docs in the test
set
and then doing them again and again.

  Once we've got a good test set, which hits on the variety of dimensions:
indexing
rate, document size, query rate while indexing, and delay-to-visibility of
indexed docs,
we'll certainly post that, as John did for the zoie tests on the zoie wiki.

  -jake

On Fri, Oct 9, 2009 at 12:29 PM, Jason Rutherglen <
jason.rutherglen@gmail.com> wrote:

> Jake and John,
>
> It would be interesting and enlightening to see NRT performance
> numbers in a variety of configurations. The best way to go about
> this is to post benchmarks that others may run in their
> environment which can then be tweaked for their unique edge
> cases. I wish I had more time to work on it.
>
> -J
>
> On Thu, Oct 8, 2009 at 8:18 PM, Jake Mannix <jake.mannix@gmail.com> wrote:
> > Jason,
> >
> > On Thu, Oct 8, 2009 at 7:56 PM, Jason Rutherglen <
> jason.rutherglen@gmail.com
> >> wrote:
> >
> >> Today near realtime search (with or without SSDs) comes at a
> >> price, that is reduced indexing speed due to continued in RAM
> >> merging. People typically hack something together where indexes
> >> are held in a RAMDir until being flushed to disk. The problem
> >> with this is, merging in the background becomes really tricky
> >> unless it's performed inside of IndexWriter (see LUCENE-1313 and
> >> IW.getReader). There is the Zoie system which uses the RAMDir
> >> solution, however it's implemented using a customized deleted
> >> doc set based on a bloomfilter backed by an inefficient RB tree
> >> which slows down queries. There's always a trade off when trying
> >> to build an NRT system, currently.
> >>
> >
> >  I'm not sure what numbers you are using to justify saying that zoie
> > "slows down queries" - latency at LinkedIn using zoie has a typical
> > median response time of 4-8ms at the searcher node level (slower
> > at the broker due to a lot of custom stuff that happens before
> > queries are actually sent to the nodex), while dealing with sustained
> > rapid indexing throughput, all with basically zero time between indexing
> > event to index visibility (ie. true real-time, not "near real time",
> unless
> > indexing events are coming in *very* fast).
> >
> >  You say there's a tradeoff, but as you should remember from your
> > time at LinkedIn, we do distributed realtime faceted search while
> > maintaining extremely low latency and still indexing sometimes more
> > than a thousand new docs a minute per node (I should dredge up
> > some new numbers to verify what that is exactly these days).
> >
> >
> > Deletes can pile up in segments so the
> >> BalancedSegmentMergePolicy could be used to remove those faster
> >> than LogMergePolicy, however I haven't tested it, and it may be
> >> trying to not do large segment merges altogether which IMO
> >> is less than ideal because query performance soon degrades
> >> (similar to an unoptimized index).
> >>
> >
> > Not optimizing all the way has shown in our case to actually be
> > *better* than the "optimal" case of a 1-segment index, at least in
> > the case of realtime indexing at rapid update pace.
> >
> >
> >  -jake
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message