lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Balmain <dbalmain...@gmail.com>
Subject Re: Faking index merge by modifying segments file?
Date Wed, 02 Nov 2005 11:58:16 GMT
> This sounds like it should be possible, except for docId clashes - if
> index A had a document with Id 100 and index B also has a document with
> Id 100, after my index file copying, index C will end up having 2
> documents with Id 100, and that won't work.  So, documents in C would
> have to be renumbered (re-assigned Ids), as they get renumbered during
> optimization, but without rewriting all index files in index C.
>
> Does this sound right?
>

As Paul Elschot already mentioned, the document ids aren't stored in
the index. The document id is really just the position of the document
in the segment and doc ids for whole index are created dynamically by
the index reader. So no renumbering is necessary.

> Also, I may not need to actually copy/move files around, if I just make
> use of sym/hard links.
>

Sure, as long as the other index isn't being updated any more. One
thing to note though. As well as making sure that none of the
filenames between the indexes clash, you'll have to make sure you
adjust the counter variable in SegmentInfos so that new files won't
clash with existing ones.

Regards,
Dave

> Thanks,
> Otis
>
>
> > -----Original Message-----
> > From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> > Sent: Tuesday, November 01, 2005 1:52 AM
> > To: java-dev@lucene.apache.org
> > Subject: Faking index merge by modifying segments file?
> >
> >
> > Hello,
> >
> > I spent most of today talking to some people about Lucene, and one of
> > them said how they would really like to have an "instantaneous index
> > merge", and how he is thinking he could achieve that by simply
> > opening
> > segments file of one index, and adding segment names of the other
> > index/indices, plus adjusting the segment size (SegSize in
> > fileformats.html), thus creating a single (but unoptimized) index.
> >
> > Any reactions to that?
> >
> > I imagine this isn't quite that simple to implement, as one would
> > have
> > to renumber all documents, in order to avoid having multiple
> > documents
> > with the same document id.
> >
> > Can anyone think of any other problems with this approach, or perhaps
> > offer ideas for possible document renumbering?
> >
> > Thanks,
> > Otis
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message