lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rutherglen <>
Subject Re: Realtime & distributed
Date Fri, 09 Oct 2009 19:29:23 GMT
Jake and John,

It would be interesting and enlightening to see NRT performance
numbers in a variety of configurations. The best way to go about
this is to post benchmarks that others may run in their
environment which can then be tweaked for their unique edge
cases. I wish I had more time to work on it.


On Thu, Oct 8, 2009 at 8:18 PM, Jake Mannix <> wrote:
> Jason,
> On Thu, Oct 8, 2009 at 7:56 PM, Jason Rutherglen <
>> wrote:
>> Today near realtime search (with or without SSDs) comes at a
>> price, that is reduced indexing speed due to continued in RAM
>> merging. People typically hack something together where indexes
>> are held in a RAMDir until being flushed to disk. The problem
>> with this is, merging in the background becomes really tricky
>> unless it's performed inside of IndexWriter (see LUCENE-1313 and
>> IW.getReader). There is the Zoie system which uses the RAMDir
>> solution, however it's implemented using a customized deleted
>> doc set based on a bloomfilter backed by an inefficient RB tree
>> which slows down queries. There's always a trade off when trying
>> to build an NRT system, currently.
>  I'm not sure what numbers you are using to justify saying that zoie
> "slows down queries" - latency at LinkedIn using zoie has a typical
> median response time of 4-8ms at the searcher node level (slower
> at the broker due to a lot of custom stuff that happens before
> queries are actually sent to the nodex), while dealing with sustained
> rapid indexing throughput, all with basically zero time between indexing
> event to index visibility (ie. true real-time, not "near real time", unless
> indexing events are coming in *very* fast).
>  You say there's a tradeoff, but as you should remember from your
> time at LinkedIn, we do distributed realtime faceted search while
> maintaining extremely low latency and still indexing sometimes more
> than a thousand new docs a minute per node (I should dredge up
> some new numbers to verify what that is exactly these days).
> Deletes can pile up in segments so the
>> BalancedSegmentMergePolicy could be used to remove those faster
>> than LogMergePolicy, however I haven't tested it, and it may be
>> trying to not do large segment merges altogether which IMO
>> is less than ideal because query performance soon degrades
>> (similar to an unoptimized index).
> Not optimizing all the way has shown in our case to actually be
> *better* than the "optimal" case of a 1-segment index, at least in
> the case of realtime indexing at rapid update pace.
>  -jake

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message