lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nigel <>
Subject Re: Analyzing performance and memory consumption for boolean queries
Date Wed, 24 Jun 2009 19:38:07 GMT
Hi Mike,

Yes, we're indexing on a separate server, and rsyncing from index snapshots
there to the search servers.  Usually rsync has to copy just a few small
.cfs files, but every once in a while merging will product a big one.  I'm
going to try to limit this by setting maxMergeMB, but of course that's a
trade-off with having more segments.

It sounds like surely any swapping out of the JVM memory could cause big and
unpredictable performance drops.  As I just mentioned in reply to Uwe, our
poor performance times don't always directly correlate with index updates,
but it may be that the damage is done and the effects are only seen sometime

We don't store norms (since we don't care about sort order), and we don't
have any deleted docs (since the index is read-only on the search servers).
What exactly is stored in the field cache?

p.s. I haven't tried SSDs yet, or for that matter faster disks of any sort.
First I'd like to get a better understanding of what I/O is required and
when during the search process, ideally to be able to have an approximate
model that predicts I/O based on the query (the way a DBA might do when
estimating how a SQL query would work with certain tables and indexes).


On Wed, Jun 24, 2009 at 5:06 AM, Michael McCandless <> wrote:

> Is it possible the occasional large merge is clearing out the IO cache
> (thus "unwarming" your searcher)?  (Though since you're rsync'ing your
> updates in, it sounds like a separate machine is building the index).
> Or... linux will happily swap out a process's core in favor of IO
> cache (though I'd expect this effect to be much less spikey).  You can
> tune "swappiness" to have it not do that:
> Maybe Lucene's norms/deleted docs/field cache were getting swapped out?
> Lucene's postings reside entirely on disk (ie, Lucene doesn't cache
> those in RAM; we rely on the OS's IO cache).  Lucene does a linear
> scan through the terms in the query... Linux will readahead, though,
> if things are fragmented this could mean lots of seeking.
> Have you tried putting the index on an SSD instead of a spinning magnetic
> disk?
> Mike

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message