lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <>
Subject Re: indexing issue
Date Sat, 29 Nov 2008 19:11:18 GMT
On Sat, Nov 29, 2008 at 12:45 PM, Michael Stoppelman <> wrote:
> Hi all,
> I've got an indexing issue I think other folks might be interested in
> hearing about and I wanted to get feedback before I went ahead and
> implemented a new method.
> Currently, the way we update indices is by sending individual delete/add
> document requests to all our search boxes individually. Each box is doing
> about 20-30qps while this is happening. The problem I'm seeing is that when
> a segment from the index is merged [honestly I don't know that much about
> segment merging] (our merge factor is set to 5) and an old highly used
> segment of the index is lost from the disk cache; most of the search
> requests to that box get prohibitively slow 10-80+ secs and I see pg/in +
> pg/out stats spike sar.

> I'm planning on implementing a method similar to the
> SOLR model using the rsync method that Doug Cutting outlined a long time ago
> on this list and forcing the new files into the disk cache using fadvice.

FYI, if you don't actually want to use rsync, Solr has very recently
implemented this in pure Java.
But forcing new files into the cache means ejecting currently used
files from the cache (for queries currently in flight).  Doesn't seem
any way to really win here if you don't have enough memory to hold the
critical parts of both indexes.

I think Mike pointed out in a recent post that it would be nice to
advise the kernel not to cache files being written during merging.  In
Solr, we would want to allow the same thing when copying over new
segment files during replication.  Unfortunately, I don't think there
is a way to do this through Java.


> Is there another strategy here? Could I create a merge policy that forces
> new segments into the disk cache before lucene nukes the old ones?
> Thanks,
> M

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message