lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nigel <nigelspl...@gmail.com>
Subject Re: Efficiently reopening remotely-distributed indexes in 2.9?
Date Wed, 07 Oct 2009 19:51:26 GMT
Right now we logically re-open an index by making an updated copy of the
index in a new directory (using rsync etc.), opening the new copy, and
closing the old one.  We don't use IndexReader.reopen() because the updated
index is in a different directory (as opposed to being updated in-place).

(Reading about some of the 2.9 changes motivated me to look into actually
using reopen().  And Michael Busch and Mark Miller both pointed out that I
was incorrect in saying that pre-2.9 reopen() wasn't more efficient than
just opening a new index -- I've read through that code now so I have at
least a basic understanding of what's happening there.  Anyway, it seems
like reopen() is a Good Thing, so I'd like to use it. (-:)

So, my real question was whether there is a "recommended" way to update an
index in-place with files copied from a separate indexing server.

For example, do you simply rsync in the new cfs files, overwrite the
segments.gen and segments_XX files, and call reopen()?  Or create an updated
copy in a new directory, then rename new directory to the old name once
you're sure you've copied everything successfully, then call reopen()?  What
does Solr do?

Thanks,
Chris

On Mon, Oct 5, 2009 at 8:39 PM, Jason Rutherglen <jason.rutherglen@gmail.com
> wrote:

> I'm not sure I understand the question. You're trying to reopen
> the segments that you're replicated and you're wondering what's
> changed in Lucene?
>
> On Mon, Oct 5, 2009 at 5:30 PM, Nigel <nigelspleen@gmail.com> wrote:
> > Anyone have any ideas here?  I imagine a lot of other people will have a
> > similar question when trying to take advantage of the reopen improvements
> in
> > 2.9.
> >
> > Thanks,
> > Chris
> >
> > On Thu, Oct 1, 2009 at 5:15 PM, Nigel <nigelspleen@gmail.com> wrote:
> >
> >> I have a question about the reopen functionality in Lucene 2.9.  As I
> >> understand it, since FieldCaches are now per-segment, it can avoid
> reloading
> >> everything when the index is reopened, and instead just load the new
> >> segments.
> >>
> >> For background, like many people we have a distributed architecture
> where
> >> indexes are created on one server and copied to multiple other servers.
>  The
> >> way that copying works now is something like the following:
> >>
> >>    1. Let's say the current index is in /indexes/a and is open
> >>    2. An empty directory for the updated index is created, let's say
> >>    /indexes/b
> >>    3. Hard links for the files in /indexes/a are created in /indexes/b
> >>    4. We rsync the current index on the server with /indexes/b, thus
> >>    copying over new cfs files and deleting hard links to files no longer
> in use
> >>    5. A new IndexReader is opened for /indexes/b and warmed up
> >>    6. The application starts using the new reader instead of the old one
> >>    7. The old IndexReader is closed and /indexes/a is deleted
> >>
> >> I'm simplifying a few steps, but I think this is familiar to many
> people,
> >> and it's my impression that Solr implements something similar.
> >>
> >> The point is, the updated index lives in a new directory in this scheme,
> >> and so we don't actually reopen the existing IndexReader; we open a new
> one
> >> with a different FSDirectory.
> >>
> >> Before Lucene 2.9, I don't think this made any difference, as (I think)
> the
> >> only advantage to calling reopen vs. just creating another IndexReader
> was
> >> having reopen figure out whether the index had actually changed.  (And
> whave
> >> a different way to figure that out, so it was a non-issue.)
> >>
> >> With Lucene 2.9, there's now a big difference, namely the per-segment
> >> caching mentioned above.  So the question is how to make use of reopen
> with
> >> our distribution scheme.  Is there an informal best practice for
> handling
> >> this case?  For example, should step #5 above rename /indexes/b to
> >> /indexes/a so the index can be reopened in the same physical location?
>  Or
> >> should rsync operate on the existing directory in-place, updating the
> >> segments* files last and relying on the fact that deleted files will not
> >> really be deleted (on Linux, at least) as long as the app is still
> holding
> >> them open?
> >>
> >> I guess the answer may depend on how exactly reopen knows which files
> are
> >> the "same" (e.g. does it look at filenames, or file descriptors, etc.).
> >>
> >> Thanks,
> >> Chris
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message