lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Optimizing NRT search
Date Sat, 27 Apr 2013 12:02:42 GMT
On Fri, Apr 26, 2013 at 5:04 PM, Aleksey <bittercold@gmail.com> wrote:
> Thanks for the response, Mike. Yes, I've come upon your blog before, it's
> very helpful.
>
> I tried bigger batches, it seems the highest throughput I can get is
> roughly 250 docs a second. From your blog, you updated your index at about
> 1MB per second, with 1K documents, which is 1000/s, but you had 24 core
> machine, while my laptop has 2 cores (and SSD). So does it mean like the
> performance I'm seeing is actually better than back in 2011? (By the way
> I'm using RAMDirectory, rather than MMap, but MMap seems similar).

Be careful with RAMDir ... it's very GC heavy as the index gets larger
since it breaks each file into 1K byte[]s.  It's best for smallish
indices.

Your tests are all with one thread?  (My tests were using multiple
threads on the 24 core machine).  So on a laptop with one thread, 250
docs/sec where each doc is 1-2 KB seems reasonable.

Still it's odd you don't see larger gains from batching up the changes
between reopens.

> Interesting thing is that NRTDirectory is about 2x faster when I'm updating
> one document at a time, but batches of 250 take about 1 second for both.
> I have not tried tuning any components yet because I don't bet understand
> what exactly all the knobs do.

Well if you're using RAMDir then NRTCachingDir really should not be
helping much at all!

> Actually, perhaps I should describe my overall use case to see if I should
> be using Lucene in this way at all.
> My searches never needs to be over entire data set, only over a tiny
> portion at a time, so I was prototyping a solution that acts kind of like a
> cache. The search fleet holds lots of small Directory instances that can be
> quickly loaded up when necessary and evicted when not in use. Each one is
> 200-200K docs in size. Updates also happen to individual directories and
> they are typically in tens of docs rather than hundreds or thousands.
> I know that having lots of separate directories and searchers is an
> overhead, but if I had everything in one, then I supposed it would be
> harder to load and evict portions of it. So am I structuring my application
> in a reasonable way or is there a better way to go about it?

This approach should work.  You use MultiReader to search across them?

You could also use a single reader + filter, or a single reader and
periodically delete the docs to be evicted.

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message