lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trejkaz <>
Subject Re: How do I write in 3.x format to an upgradeded index using Lucene 4.10
Date Wed, 01 Feb 2017 01:53:01 GMT
> If we take our old 3.x index and apply IndexUpgrader to it, we end up with a 4.10 index.
> There are several lucene 4.x files created in the index directory and no errors are thrown.
> However, it appears that the index data is still in the 3.x format, namely it remains:
> "thanks", "coming"
> and not:
> "thanks", <pim>, "coming"

Well, this is a different thing really. The index is in the 4.x
format, but the analysis which was performed remains the 3.x analysis,
because nothing was done to change the postings.

So this whole thing is really just a "make sure to use the same
analyser to query which you used to index" problem. So if you indexed
using a Lucene 3 analyser, then you should be using the same v3
analyser when you query against the index in Lucene 4.

So the usual rules apply:
  * Beware of Version.LATEST/LUCENE_CURRENT. Always use the exact
version, and keep using it.
  * If Lucene remove support for some Version you were using, don't
update the Version you're using. Instead, take a copy of the
Tokenizer/TokenFilter you were using from the older version and port
it to work on the new version. Maintain these frozen off analysis
components forever.

But that said, we didn't experience any problems like this from 3 to
4, but rather obscure problems where backwards compatibility was not
maintained in Lucene itself, e.g. places where despite passing in a
Version object, the older behaviour was not maintained. IIRC, the term
length limits being changed was one of these. And in these situations,
for the most part, freezing off a copy of the old behaviour works

That said, we don't use the "classic" query parser, but rather the
flexible one. And maybe if you're using the classic one, it might have
some misbehaviour around this which we didn't strike by using the
flexible one.

> So we need a way to write documents in 3.x format (no <pim>), to our upgraded indexes,
> new indexes can use native 4.10 format.

It sounds like you just need to use the same analyser you were
previously using, possibly forever...


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message