lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Lucene 4 single segment performance improvement tips?
Date Wed, 05 Mar 2014 12:25:21 GMT
What sorts of queries are you running?  It seems like they must be
very terms-dict intensive, e.g. primary key lookups or multi-term
queries, and maybe not matching too many documents?

It's strange you can't get CPU usage up, as you add threads.  Maybe
simplify the test to remove Jetty?  Ie, a standalone test just
invoking Lucene APIs directly using multiple threads.

Does the profiler reveal and hot locks, where threads are having to
wait to acquire the lock?

Mike McCandless

http://blog.mikemccandless.com


On Wed, Mar 5, 2014 at 4:18 AM, Arvind Kalyan <base16@gmail.com> wrote:
> Hi folks,
>
> We are currently using Lucene 4.5 and we are hitting some bottlenecks and
> appreciate some input from the community.
>
> This particular index (the disk size for which is about 10GB) is guaranteed
> to not have any updates, so we made it a single segment index by doing a
> forceMerge(1). The index is guaranteed to be in-memory as well: we use the
> MMapDirectory and the whole thing is mlocked after load. So there is no
> disk I/O.
>
> Our runtime/search use-case is very simple: run filters to select all docs
> that match some conditions specified in a filter query (we do not use
> Lucene scoring) and return the first 100 docs that match (this is an
> over-simplification)
>
> On a machine with nothing else running, we are unable to move the needle on
> CPU utilization to serve higher QPS. We see that most of the time is spent
> in BlockTreeTermsReader.FieldReader.iterator() when we run profiling tools
> to see where time is being spent. The CPU usage doesn't cross 30% (we have
> multiple threads one per each client connected over a Jetty connection all
> taken from a bounded thread-pool). We tried the usual suspects like
> tweaking size of the threadpool, changing some jvm parameters like newsize,
> heapsize, using cms for old gen, parnew for newgen, etc.
>
> Does anyone here any pointers or general suggestions on how we can get good
> performance out of Lucene 4.x? Specifically IndexSearcher performance
> improvements for large, single-segment, atomicreaders.
>
> I'll share more specifics if necessary but I'd like to hear from folks here
> what your experience has been and what you did to speed up your
> IndexSearchers to improve throughput *and/or* latency.
>
> Thanks!
>
> --
> Arvind Kalyan
> http://www.linkedin.com/in/base16

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message