lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arvind Kalyan <bas...@gmail.com>
Subject Re: Lucene 4 single segment performance improvement tips?
Date Wed, 05 Mar 2014 21:40:50 GMT
Thanks Mike. Good idea.. we have a pretty thick stack and I got it down to
the jetty+lucene thinking it is barebones enough.. but good call on running
it purely on lucene. I'll see if it moves any needle (hopefully it does).


On Wed, Mar 5, 2014 at 4:25 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> What sorts of queries are you running?  It seems like they must be
> very terms-dict intensive, e.g. primary key lookups or multi-term
> queries, and maybe not matching too many documents?
>
> It's strange you can't get CPU usage up, as you add threads.  Maybe
> simplify the test to remove Jetty?  Ie, a standalone test just
> invoking Lucene APIs directly using multiple threads.
>
> Does the profiler reveal and hot locks, where threads are having to
> wait to acquire the lock?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Wed, Mar 5, 2014 at 4:18 AM, Arvind Kalyan <base16@gmail.com> wrote:
> > Hi folks,
> >
> > We are currently using Lucene 4.5 and we are hitting some bottlenecks and
> > appreciate some input from the community.
> >
> > This particular index (the disk size for which is about 10GB) is
> guaranteed
> > to not have any updates, so we made it a single segment index by doing a
> > forceMerge(1). The index is guaranteed to be in-memory as well: we use
> the
> > MMapDirectory and the whole thing is mlocked after load. So there is no
> > disk I/O.
> >
> > Our runtime/search use-case is very simple: run filters to select all
> docs
> > that match some conditions specified in a filter query (we do not use
> > Lucene scoring) and return the first 100 docs that match (this is an
> > over-simplification)
> >
> > On a machine with nothing else running, we are unable to move the needle
> on
> > CPU utilization to serve higher QPS. We see that most of the time is
> spent
> > in BlockTreeTermsReader.FieldReader.iterator() when we run profiling
> tools
> > to see where time is being spent. The CPU usage doesn't cross 30% (we
> have
> > multiple threads one per each client connected over a Jetty connection
> all
> > taken from a bounded thread-pool). We tried the usual suspects like
> > tweaking size of the threadpool, changing some jvm parameters like
> newsize,
> > heapsize, using cms for old gen, parnew for newgen, etc.
> >
> > Does anyone here any pointers or general suggestions on how we can get
> good
> > performance out of Lucene 4.x? Specifically IndexSearcher performance
> > improvements for large, single-segment, atomicreaders.
> >
> > I'll share more specifics if necessary but I'd like to hear from folks
> here
> > what your experience has been and what you did to speed up your
> > IndexSearchers to improve throughput *and/or* latency.
> >
> > Thanks!
> >
> > --
> > Arvind Kalyan
> > http://www.linkedin.com/in/base16
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Arvind Kalyan
http://www.linkedin.com/in/base16
cell: (408) 761-2030

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message