lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aleksey <bitterc...@gmail.com>
Subject Re: Optimizing NRT search
Date Fri, 26 Apr 2013 21:04:11 GMT
Thanks for the response, Mike. Yes, I've come upon your blog before, it's
very helpful.

I tried bigger batches, it seems the highest throughput I can get is
roughly 250 docs a second. From your blog, you updated your index at about
1MB per second, with 1K documents, which is 1000/s, but you had 24 core
machine, while my laptop has 2 cores (and SSD). So does it mean like the
performance I'm seeing is actually better than back in 2011? (By the way
I'm using RAMDirectory, rather than MMap, but MMap seems similar).
Interesting thing is that NRTDirectory is about 2x faster when I'm updating
one document at a time, but batches of 250 take about 1 second for both.
I have not tried tuning any components yet because I don't bet understand
what exactly all the knobs do.

Actually, perhaps I should describe my overall use case to see if I should
be using Lucene in this way at all.
My searches never needs to be over entire data set, only over a tiny
portion at a time, so I was prototyping a solution that acts kind of like a
cache. The search fleet holds lots of small Directory instances that can be
quickly loaded up when necessary and evicted when not in use. Each one is
200-200K docs in size. Updates also happen to individual directories and
they are typically in tens of docs rather than hundreds or thousands.
I know that having lots of separate directories and searchers is an
overhead, but if I had everything in one, then I supposed it would be
harder to load and evict portions of it. So am I structuring my application
in a reasonable way or is there a better way to go about it?

Thank you in advance,

Aleksey





On Fri, Apr 26, 2013 at 3:46 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Batching the updates really ought to improve overall throughput.  Have you
> tried with even bigger batches (100,1000 docs)?
>
> But, how large is each update?  Are you changing any IndexWriter settings,
> e.g. ramBufferSizeMB.
>
> Using threads should help too, at least a separate thread doing indexing
> from calling SearcherManager.maybeRefresh (and, separate threads doing
> searching).
>
> You can also check out
>
> http://blog.mikemccandless.com/2011/06/lucenes-near-real-time-search-is-fast.htmlwhere
> I go into some detail on speeding up indexing rate and refresh speed
> with near-real-time ...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Apr 25, 2013 at 11:10 PM, Aleksey <bittercold@gmail.com> wrote:
>
> > Hey guys,
> >
> > I'm new to Lucene and I was trying to estimate how fast I can make
> updates
> > to the index and reopen it. The behavior I'm seeing seems odd.
> > I'm using Lucene4.2 and SearchManager instance that takes an index
> writer.
> >
> > I make a loop where I update 1 document, then call maybeRefresh and
> acquire
> > new searcher and make a search to verify that the update is there. On my
> > laptop this does about 100 iterations per second.
> >
> > Then I run another loop but make 10 updates before reopening the index,
> and
> > this only does 10 iterations per second, proportionally less. I was
> > expecting that if I batch the updates, I can get higher overall
> throughput,
> > but that does not seem to be the case. The size of the index I'm updating
> > doesn't make a difference either, I tried 3K and 100K document sets, 1-2K
> > each doc, but both produce same update speed (though I'm not calling
> commit
> > in these instances).
> >
> > Can anyone point me into the right direction to investigate this or hint
> > how to maximize write throughput to the index, while still <.5 second
> > delays in seeing the updates.
> >
> > Thank you in advance,
> >
> > Aleksey
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message