lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject RE: Faking index merge by modifying segments file?
Date Wed, 02 Nov 2005 11:14:03 GMT
Hello,

--- Robert Engels <rengels@ix.netcom.com> wrote:

> Problem is the terms need to be sorted in a single segment.

Are you referring to Term Dictionary (.tis and .tii files as described
at http://lucene.apache.org/java/docs/fileformats.html )?  If so, is
that really true?

I don't have an optimized Lucene multi-file index handy to look at, but
.tis and .tii files are "per segment" files, so wouldn't a set of .tis
and .tii files from multiple indices be equivalent to a set of .tis and
.tii files from multiple segments of a single index?

For example, if we have two indices, A and B, both optimized, we have:

A: segA.tis   (this may contain terms bar and foo)
   segA.tii
   ...
   segments   (this would list segA)

B: segB.tis   (this may contain terms piggy and bank)
   segB.tii
   ...
   segments   (this would list segB)

Wouldn't that be the same as a single index, say index C:

C: segA.tis   (this may contain terms bar and foo)
   segA.tii
   segB.tis   (this may contain terms piggy and bank)
   segB.tii
   ...
   segments   (this would list segments segA and segB)


That is really what I am talking about: take all index files of index A
and all index files of segment B and stick them in a new index dir for
a new index C.  Then open segments files of index A and index B, pull
out segment names and other information from there, and write a new
segments file with that information in index dir for that new index C.

This sounds like it should be possible, except for docId clashes - if
index A had a document with Id 100 and index B also has a document with
Id 100, after my index file copying, index C will end up having 2
documents with Id 100, and that won't work.  So, documents in C would
have to be renumbered (re-assigned Ids), as they get renumbered during
optimization, but without rewriting all index files in index C.

Does this sound right?

Also, I may not need to actually copy/move files around, if I just make
use of sym/hard links.

Thanks,
Otis


> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> Sent: Tuesday, November 01, 2005 1:52 AM
> To: java-dev@lucene.apache.org
> Subject: Faking index merge by modifying segments file?
> 
> 
> Hello,
> 
> I spent most of today talking to some people about Lucene, and one of
> them said how they would really like to have an "instantaneous index
> merge", and how he is thinking he could achieve that by simply
> opening
> segments file of one index, and adding segment names of the other
> index/indices, plus adjusting the segment size (SegSize in
> fileformats.html), thus creating a single (but unoptimized) index.
> 
> Any reactions to that?
> 
> I imagine this isn't quite that simple to implement, as one would
> have
> to renumber all documents, in order to avoid having multiple
> documents
> with the same document id.
> 
> Can anyone think of any other problems with this approach, or perhaps
> offer ideas for possible document renumbering?
> 
> Thanks,
> Otis
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message