lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Can 2.3 read indexes created by 2.4?
Date Thu, 11 Mar 2010 20:00:42 GMT
Urgh, I failed to update the opening in fileformats.html (describing
what's changed on each version).  We also had a change in 3.0, from
removing compressed fields.  I'll fix...

But: 2.3 can't read indexes created with 2.4 (and in general older
Lucene releases very likely will not be able to read indexes created
by newer versions, when there's been an index format change).

Mike

On Thu, Mar 11, 2010 at 1:35 PM, Nathanael D. Jones
<nathanael.jones@gmail.com> wrote:
> Lucene 2.4 introduced a change not documented on the File Formats page
>
> *LUCENE-510: The index now stores strings as true UTF-8 bytes (previously it
> was Java's modified UTF-8). If any text, either stored fields or a token,
> has illegal UTF-16 surrogate characters, these characters are now silently
> replaced with the Unicode replacement character U+FFFD. This is a change to
> the index file format.*
> *(Marvin Humphrey via Mike McCandless)*
>
>
> Is there a reason this change isn't documened on the File Formats page of
> any 2.4+ doc release?
>
> From the 3.0.1 docs:
>
> *http://lucene.apache.org/java/3_0_1/fileformats.html*<http://lucene.apache.org/java/3_0_1/fileformats.html>
> *"**Compatibility notes are provided in this document, describing how file
> formats have changed from prior versions.*
>
> *In version 2.1, the file format was changed to allow lock-less commits (ie,
> no more commit lock). The change is fully backwards compatible: you can open
> a pre-2.1 index for searching or adding/deleting of docs. When the new
> segments file is saved (committed), it will be written in the new file
> format (meaning no specific "upgrade" process is needed). But note that once
> a commit has occurred, pre-2.1 Lucene will not be able to read the index.*
>
> *In version 2.3, the file format was changed to allow segments to share a
> single set of doc store (vectors & stored fields) files. This allows for
> faster indexing in certain cases. The change is fully backwards compatible
> (in the same way as the lock-less commits change in 2.1)."*
>
>
> But no mention of the unicode change in version 2.4... was it somehow
> forwards compatible?
>
> Can I read indexes created in 2.4 with a 2.3 compliant reader?
> (Specifically, I'm interested in creating indexes with Lucene 3 and reading
> them with CLucene on an iPhone.)
>
> Thanks,
>
> Nathanael
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message