lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Faking index merge by modifying segments file?
Date Wed, 02 Nov 2005 11:47:40 GMT
Hello,

--- Paul Elschot <paul.elschot@xs4all.nl> wrote:

> On Tuesday 01 November 2005 08:51, Otis Gospodnetic wrote:
> > Hello,
> > 
> > I spent most of today talking to some people about Lucene, and one
> of
> > them said how they would really like to have an "instantaneous
> index
> > merge", and how he is thinking he could achieve that by simply
> opening
> > segments file of one index, and adding segment names of the other
> > index/indices, plus adjusting the segment size (SegSize in
> > fileformats.html), thus creating a single (but unoptimized) index.
> > 
> > Any reactions to that?
> > 
> > I imagine this isn't quite that simple to implement, as one would
> have
> > to renumber all documents, in order to avoid having multiple
> documents
> > with the same document id.
> > 
> > Can anyone think of any other problems with this approach, or
> perhaps
> > offer ideas for possible document renumbering?
> 
> Document numbers within segments are determined dynamically in the
> index reader, so these should not be a problem. Each segment simply
> numbers
> its documents from zero.

Uh, and I always thought they were stored in the index.  Aren't they
stored in the .fdx and .fdt files?  And shouldn't they also be linked
from some place.  I see a mention of document numbers in information
about the .frq.

> Iirc the segment names determine the order
> of the segments for an index reader.
> 
> I think creating a new index by adding segments from an existing one
> should
> be fairly straightforward. Some care will be needed to avoid
> clashes in the segment names.

You mean ensuring that segment _x from index A doesn't clash with _x
from index B?  Segment names are written only in the segments file, I
believe, so I think if I detect that _x is already taken, I could
simply rename it to something (e.g. _foo) that hasn't been taken yet,
and remember to use that segment name when writing the segments file.

> Also what should happen with
> the index from which the segments are taken? Should the shared
> segments be copied between indexes?

I can simply distroy the original index once I've created a fakely
merged one.  I'm not sure what you mean by shared segments.  If I have
two indices, A and B, then each of them will have its own set of
segments with no segments in common.

> It's possible to share segments between indexes when the file system
> allows files to be present in multiple directories.

Oh, are you saying that I could just leave segments where they are and
use something like symlinks to point to them from a new index?

e.g.
A: <index files for A>
B: <index files for B>
C: <symlinks to index files for A>
   <symlinks to index files for B>
   <segments file with segment names for A and B>

?

Thanks,
Otis


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message