lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "De Simone, Alessandro" <Alessandro.DeSim...@bvdinfo.com>
Subject RE: search time & number of segments
Date Mon, 19 May 2014 09:54:46 GMT
Thank you for your input

> How much RAM does your search machine have?

We have 16GB of ram, and there is at least 8GB free memory for the OS file cache. The cache
is working pretty well.

> That sounds right. Although each segment is 1/16 of the full index size, the number of
seeks per
segment is not 1/16: Larger indexes require relatively fewer seeks. Think binary search and
log(values_in_field), although that is highly simplified.

The "IO calls" I was referring to is the number of time the "BufferedIndexInput.refill()"
function is called. So it means that we have 16 times more bytes read when there are 16 segments
for the exact same result. 
I would have agreed to blame seeks if Lucene was reading more or less the same number of bytes
but with worse performance. In fact, that's exactly what I was expecting. But this is not
the case here. 
It's almost as if extracting the terms stats (or whatever metadata the segment has) is more
costly than the search itself. And I'm not talking about queries with few results. 

> I am guessing that you are using spinning drives and that there is not much RAM in the
machine? 

As you can see we have a lot of RAM. Using the resource manager I see that nothing is trashing
the system or swapping to disk. Lucene is just a lot slower for every query. When the query
is in the OS cache, the call takes a few milisecs as expected.


Alessandro De Simone

-----Original Message-----
From: Toke Eskildsen [mailto:te@statsbiblioteket.dk] 
Sent: samedi 17 mai 2014 20:04
To: java-user@lucene.apache.org
Subject: RE: search time & number of segments

De Simone, Alessandro [Alessandro.DeSimone@bvdinfo.com] wrote:
> We have a performance issue ever since we stopped optimizing the index. We are using
Lucene 4.8 (jvm 32bits for searching, 64bits for indexing) on Windows 2008R2.

How much RAM does your search machine have?

> For instance, a search with (2 termQuery + 1 spanquery) x 6 fields made 143 IO calls.
Now with 16 segments  we have 2432 IO calls and the search time is really bad.

[...]

That sounds right. Although each segment is 1/16 of the full index size, the number of seeks
per segment is not 1/16: Larger indexes require relatively fewer seeks. Think binary search
and log(values_in_field), although that is highly simplified.

> The size of the Index is ~24gb (14millions documents). No field are stored, only indexed.

Normally the penalty of running un-optimized is not that great, so it sounds like your machine
cannot provide the I/O speed it needs (as opposed to having a great logistics overhead from
the multiple segments). I am guessing that you are using spinning drives and that there is
not much RAM in the machine? The easy solution is either to throw RAM at the problem or switch
to SSD.

- Toke Eskildsen
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message