lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Stoppelman" <stop...@gmail.com>
Subject Re: indexing issue
Date Sun, 14 Dec 2008 21:53:50 GMT
On Sat, Nov 29, 2008 at 11:11 AM, Yonik Seeley <yonik@apache.org> wrote:

> On Sat, Nov 29, 2008 at 12:45 PM, Michael Stoppelman <stopman@gmail.com>
> wrote:
> > Hi all,
> >
> > I've got an indexing issue I think other folks might be interested in
> > hearing about and I wanted to get feedback before I went ahead and
> > implemented a new method.
> >
> > Currently, the way we update indices is by sending individual delete/add
> > document requests to all our search boxes individually. Each box is doing
> > about 20-30qps while this is happening. The problem I'm seeing is that
> when
> > a segment from the index is merged [honestly I don't know that much about
> > segment merging] (our merge factor is set to 5) and an old highly used
> > segment of the index is lost from the disk cache; most of the search
> > requests to that box get prohibitively slow 10-80+ secs and I see pg/in +
> > pg/out stats spike sar.
>
> > I'm planning on implementing a method similar to the
> > SOLR model using the rsync method that Doug Cutting outlined a long time
> ago
> > on this list and forcing the new files into the disk cache using fadvice.
>
> FYI, if you don't actually want to use rsync, Solr has very recently
> implemented this in pure Java.
> But forcing new files into the cache means ejecting currently used
> files from the cache (for queries currently in flight).  Doesn't seem
> any way to really win here if you don't have enough memory to hold the
> critical parts of both indexes.


I found a rsync patch that has a flag that lets you tell the kernel to not
disk cache the new files being copied over (adds the --drop-caches flag).

http://lists.samba.org/archive/rsync/2007-May/017707.html


>
>
> I think Mike pointed out in a recent post that it would be nice to
> advise the kernel not to cache files being written during merging.  In
> Solr, we would want to allow the same thing when copying over new
> segment files during replication.  Unfortunately, I don't think there
> is a way to do this through Java.
>
> -Yonik
>
> > Is there another strategy here? Could I create a merge policy that forces
> > new segments into the disk cache before lucene nukes the old ones?
> >
> > Thanks,
> > M
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message