lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arvind Kalyan <bas...@gmail.com>
Subject Re: A question on performance
Date Wed, 07 Jan 2015 16:27:35 GMT
Performance measurements must be made carefully. Have you performed any
warmup?

I recommend doing 10k calls just to let the dust settle  including stuff
like jit, before taking any kind if measurements. Also use mmapdirectory,
if not already, to help with spikes in disk accesses.

Also keep track of garbage collections that happened during your profiling.
That is a different problem to solve and has different solutions. But most
importantly, make sure you don't use a big heap just to use the big index
if you are using mmapdirectory.

There are probably a few more things I'd do given various other
requirements (like disabling swap) and constraints.

On Wednesday, January 7, 2015, rama44ster <rama44ster@gmail.com> wrote:

> Hi,
> I have a lucene index which has close to 480M documents. And I ran around
> 1000 queries against the index. Each query is a boolean query with 3
> different tokens. That is the query has 3 operands which MUST occur.
> Executing such 3 token queries gives the following latency percentiles.
>
> 50 = 16 ms
> 75 = 52 ms
> 90 = 121 ms
> 95 = 262 ms
> 99 = 76010 ms
> 99.9 = 76037 ms
>
> Is the latency expected to degrade when the number of docs is as high as
> 480M? The size of the index is 36G. All the segments in the index are
> merged into one segment. Even when the segments are not merged, the
> latencies are not very different. Each document has 5-6 stored fields. But
> as mentioned above, the above latencies are for boolean queries that don't
> access any stored fields, but just do a posting list lookup on 3 tokens.
>
> Any ideas on what could be wrong here?
>
> Thanks in advance,
> Prasad.
>


-- 
Arvind Kalyan
http://www.linkedin.com/in/base16
cell: (408) 761-2030

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message