lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Reading a v2 index in v4
Date Mon, 09 Jun 2014 16:04:17 GMT
Hi,

there is a way to make this work (which is the "official way" to do it): Your application
software is already on Lucene 3.6, so why not simply use the IndexUpgrader class, which is
shipped with Lucene 3.6? This class will upgrade the existing indexes (back to version 1.0)
of your users to the latest 3.6 file format. After that, the index is readable with Lucene
4.x (but will not be with Lucene 5 aka trunk). If your users then move to Lucene 4, they can
read the indexes. Ideally, you would also upgrade the indexes using IndexUpgrader to 4.x format
when opening them for first time with the latest version.

IndexUpgrader is in fact just a merge policy that overrides IndexWriter#forceMerge(1) to always
merge all segments with older format, although the index may be already only one segment.
So you can also instantiate an IndexWriter as usual and assign the UpgradeMergePolicy to IndexWriterConfig.
This merge policy is just a wrapper around another one like the default TieredMergePolicy.

About the technical problem: It would not be enough to pass a custom codec to the 4.x IndexReader,
because those old Indexes do not really support all the semantics Lucene 3 and 4 offer because
lots of stuff already in IndexReader and SegmentCoreReaders cannot handle such old indexes.
If you would make it work, the main issue of, for example, the 3.x codec is the order of terms,
which changed from UTF-16 to UTF-8 order.

If you really want to read older indexes, the following might work:
- Clone the Lucene 3.x codec to a private package name and change it to support older indexes
(will be very hard).
- Add META-INF metadata to make Lucene 4 load *your* custom Lucene 3.x codec instead of the
shipped one from the classpath. The codec must have name "Lucene3x" (although its also for
older indexes).
But I am stll not sre, if this works completely, because IndexReader and IndexWriter may throw
IndexTooOldException before the codec actually can trip in!

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Trejkaz [mailto:trejkaz@trypticon.org]
> Sent: Monday, June 09, 2014 2:54 PM
> To: Lucene Users Mailing List
> Subject: Re: Reading a v2 index in v4
> 
> On Mon, Jun 9, 2014 at 10:17 PM, Adrien Grand <jpountz@gmail.com>
> wrote:
> > Hi,
> >
> > It is not possible to read 2.x indices from Lucene 4, even with a
> > custom codec. For instance, Lucene 4 needs to hook into
> > SegmentInfos.read to detect old 3.x indices and force the use of the
> > Lucene3x codec since these indices don't expose what codec has been
> > used to write them.
> 
> Rats. I was wondering how the Lucene3x codec worked, but now I know. I
> was hoping codecs were going to be more flexible than that, but it looks like
> nobody considered the possibility that I might want to pass a Codec into my
> IndexReader. :(
> 
> TX
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message