lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Desidero <desid...@gmail.com>
Subject Re: Query performance in Lucene 4.x
Date Thu, 03 Oct 2013 03:39:40 GMT
You are correct in that I'm using a MultiReader over multiple IndexReaders
("shards") that contain one segment each to basically do what Lucene does
with a single IndexReader and multiple segments. It's done this way for two
reasons:

1) By using multiple single-segment "shards", I can completely control the
level of parallelism for my queries. It's possible to futz with
MergePolicies to control the number of segments in an index, but most of
the controls have variable outcomes (e.g. setting someAttribute allows for
up to n segments, but possibly less). I was hoping to avoid it, but I might
end up making a custom merge policy before all of this is over...

2) Unfortunately, the process we use to clean our data for indexing is
extremely CPU-intensive. Since we split the data into multiple shards, we
can re-create a shard in case a problem arises in less time because there's
less to index. This also allows us to exploit parallelism across machines
rather than just across threads to generate multiple shards simultaneously.
I generally use 128 logical CPU cores (2 servers) when reindexing for
tests, but we would benefit from using more if we had to do it quickly in
production for whatever reason.

As far as the IndexSearcher extension is concerned, it's actually pretty
simple. I've done some testing on my own and QA is going to get their hands
on it soon, so hopefully they won't find anything that I didn't!


On Wed, Oct 2, 2013 at 9:52 PM, Vitaly Funstein <vfunstein@gmail.com> wrote:

> Hmm, I guess your IndexSearcher is backed by a MultiReader which operates
> on these "shards" you're referring to, which are supposed to be
> single-segment indexes? If so, this topology sounds fairly equivalent, at
> least in concept but maybe similar in performance as well, to the regular
> case when you search a single index shard but allow multiple segments in
> it, and let Lucene manage segment creation/merging, instead of effectively
> disabling it.
>
> But this is all a speculation on my part, absent a concrete side by side
> comparison. Also, from your description, what you are doing to
> extend/replace internal plumbing in IndexSearcher sounds awfully complex
> and hard to envision... do make sure to run functional tests when it's all
> in place. :)
>
>
> On Wed, Oct 2, 2013 at 5:37 PM, Desidero <desidero@gmail.com> wrote:
>
> > Vitaly,
> >
> > Thanks for your comments.
> >
> > Unfortunately, thread pool task overload is not the problem. When I
> > extended the IndexSearcher class last night, I had it create one task per
> > shard (20 tasks) instead of the default which turned out to be somewhere
> > around 320 (I didn't realize it created quite so many segments when I
> > recreated my index yesterday!) and performance was still awful as long as
> > there were multiple segments per shard.
> >
> > I understand that the fork/join framework doesn't work well with the
> > standard IndexSearcher's parallel search implementation, but I extended
> the
> > class and my implementation avoids the pitfalls as long as there are very
> > few (preferably one) segments per shard. The blocking in the original
> > SegmentWorkerCallable is still there, but I set it up so that if one
> worker
> > already has the lock, any others that try to lock it will just throw
> their
> > results in a separate queue which is polled by the thread that managed to
> > get the lock. After all the child tasks are complete, the queue is
> checked
> > one more time to be safe and then processing continues. There is
> basically
> > no waiting in that chunk of code (per the profiler). After doing some
> tests
> > with the ForkJoinPool and custom code as well as a standard
> > ThreadPoolExecutor, I found that the Fork/Join implementation was faster.
> >
> > The system I'm working on is much more sensitive to response time
> (latency)
> > than overall throughput, so we've had to focus on parallelism within
> > queries rather than across many queries. Batch processing performance is
> > still important, but we cannot sacrifice quick "online" responses. It
> would
> > be much easier to avoid this whole mess, but we cannot meet our
> performance
> > requirements without it.
> >
> > Regards,
> >
> > Matt
> >
> >
> > On Oct 2, 2013 3:36 PM, "Vitaly Funstein" <vfunstein@gmail.com> wrote:
> >
> > > Matt,
> > >
> > > I think you are mostly on track with suspecting thread pool task
> overload
> > > as the possible culprit here. First, the old school (prior to Java 7)
> > > ThreadPoolExecutor only accepts a BlockingQueue to use internally for
> > > worker tasks, instead of a concurrent variant (not sure why). So this
> > > internal work queue will become a significant point of contention when
> > > using the pool in a pattern similar to your use case, i.e. submitting
> > lots
> > > of tasks to the pool as fast as possible.
> > >
> > > Second, I am not too familiar with the internals of the fork/join pool
> > > implementation in 1.7 (if that's what you're using), but from reading
> the
> > > daunting javadoc for ForkJoinTask my rough guess is it's not terribly
> > well
> > > suited for use in IndexSearcher. In particular, one of the possible
> > > "non-compliant" behaviors is that a mutex lock is taken for each call()
> > > invocation that operates on an individual leaf slice. This is evident
> > from
> > > code inspection, and based on this, I am not sure what benefit, if any,
> > > multi-threaded search over a multi-segment index would provide in
> > general -
> > > regardless of the choice of thread pool implementation.
> > >
> > > I think a better strategy, as mentioned in another thread, is to
> optimize
> > > your system for multiple concurrent queries, rather than focusing on
> > > forcing each query to run across multiple threads/cores. With that
> > > approach, you could for instance, set up a non-blocking queue like
> > > ConcurrentLinkedQueue that would contain individual query tasks, then
> > use a
> > > fixed pool of worker threads to consume the queue in a loop and run
> them.
> > > In this scenario, you shouldn't need to pass an ExecutorService
> instance
> > to
> > > IndexSearcher.
> > >
> > > A strategy like that should provide for better query throughput,
> > regardless
> > > of whether each shard consists of a single segment or not - provided
> each
> > > of the queries is tied to a particular shard and can't search any
> others.
> > >
> > >
> > >
> > > On Tue, Oct 1, 2013 at 4:10 PM, Desidero <desidero@gmail.com> wrote:
> > >
> > > > Uwe,
> > > >
> > > > I was using a bounded thread pool.
> > > >
> > > > I don't know if the problem was the task overload or something about
> > the
> > > > actual efficiency of searching a single segment rather than iterating
> > > over
> > > > multiple AtomicReaderContexts, but I'd lean toward task overload. I
> > will
> > > do
> > > > some testing tonight to find out for sure.
> > > >
> > > > Matt
> > > >  Hi,
> > > >
> > > > use a bounded thread pool.
> > > >
> > > > Uwe
> > > >
> > > > -----
> > > > Uwe Schindler
> > > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > > http://www.thetaphi.de
> > > > eMail: uwe@thetaphi.de
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Desidero [mailto:desidero@gmail.com]
> > > > > Sent: Tuesday, October 01, 2013 11:37 PM
> > > > > To: java-user@lucene.apache.org
> > > > > Subject: Re: Query performance in Lucene 4.x
> > > > >
> > > > > For anyone who was wondering, this was actually resolved in a
> > different
> > > > > thread today. I misread the information in the
> > > > > IndexSearcher(IndexReader,ExecutorService) constructor
> documentation
> > -
> > > I
> > > > > was under the impression that it was submitting a thread for each
> > index
> > > > > shard (MultiReader wraps 20 shards, so 20 tasks) but it was really
> > > > submitting
> > > > > a task for each segment within each shard (20 shards * ~10
> segments =
> > > > ~200
> > > > > tasks) which is horrible. Since my index changes infrequently, I'm
> > > using
> > > > > forceMerge(1) before sending out updated indexes to the slave
> > servers.
> > > > > Without any extra tuning (threads, # of shards, etc) I've gone from
> > > ~2900
> > > > > requests per minute to ~10k requests per minute.
> > > > >
> > > > > Thanks to Adrien and Mike for the clarification and Benson for
> > bringing
> > > > up
> > > > > the question that led to my answer.
> > > > >
> > > > > I'm still pretty new to Lucene so I have a lot of poking around to
> > do,
> > > > but I'm
> > > > > going to try to implement the "virtual segment" concept that Mike
> > > > > mentioned. It'll be really helpful for those of us who want
> > parallelism
> > > > within
> > > > > queries and don't want to forceMerge.
> > > > >
> > > > >
> > > > > On Fri, Sep 27, 2013 at 9:55 AM, Desidero <desidero@gmail.com>
> > wrote:
> > > > >
> > > > > > Erick,
> > > > > >
> > > > > > Thank you for responding.
> > > > > >
> > > > > > I ran tests using both compressed fields and uncompressed fields,
> > and
> > > > > > it was significantly slower with uncompressed fields. I looked
> into
> > > > > > the lazy field loading per your suggestion, but we don't get
any
> > > > > > values from the returned Documents until the result set has
been
> > > > > appropriately reduced.
> > > > > > Since we only store one retrievable field and we always need
to
> get
> > > > > > it, it doesn't save any time loading it lazily.
> > > > > >
> > > > > > I'll try running a test without loading any fields just to see
> how
> > it
> > > > > > affects performance and let you know how that goes.
> > > > > >
> > > > > > Regards,
> > > > > > Matt
> > > > > >
> > > > > >
> > > > > > On Fri, Sep 27, 2013 at 8:01 AM, Erick Erickson
> > > > > <erickerickson@gmail.com>wrote:
> > > > > >
> > > > > >> Hmmm, since 4.1, fields have been stored compressed by default.
> > > > > >> I suppose it's possible that this is a result of
> > > > > >> compressing/uncompressing.
> > > > > >>
> > > > > >> What happens if
> > > > > >> 1> you enable lazy field loading
> > > > > >> 2> don't load any fields?
> > > > > >>
> > > > > >> FWIW,
> > > > > >> Erick
> > > > > >>
> > > > > >> On Thu, Sep 26, 2013 at 10:55 AM, Desidero <desidero@gmail.com>
> > > > > wrote:
> > > > > >> > A quick update:
> > > > > >> >
> > > > > >> > In order to confirm that none of the standard migration
> changes
> > > had
> > > > > >> > a negative effect on performance, I ported my Lucene
4.x
> version
> > > > > >> > back to Lucene 3.6.2 and kept the newer API rather
than using
> > the
> > > > > >> > custom ParallelMultiSearcher and other deprecated
> > methods/classes.
> > > > > >> >
> > > > > >> > Performance in 3.6.2 is even faster than before (~2900
> > > requests/min
> > > > > >> with 4.x
> > > > > >> > vs ~6200 requests/min with 3.6.2), so none of my code
changes
> > > > > >> > should be causing the difference. It seems to be something
> > Lucene
> > > > > >> > is doing under
> > > > > >> the
> > > > > >> > covers.
> > > > > >> >
> > > > > >> > Again, if there's any other information if I can provide
to
> help
> > > > > >> determine
> > > > > >> > what's going on, please let me know.
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> > Matt
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > -------------------------------------------------------------------
> > > > > >> > -- To unsubscribe, e-mail:
> > > java-user-unsubscribe@lucene.apache.org
> > > > > >> > For additional commands, e-mail:
> > java-user-help@lucene.apache.org
> > > > > >> >
> > > > > >>
> > > > > >>
> > > ---------------------------------------------------------------------
> > > > > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > > >> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> > > > > >>
> > > > > >>
> > > > > >
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message