lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-510) IndexOutput.writeString() should write length in bytes
Date Fri, 09 May 2008 10:00:55 GMT


Michael McCandless commented on LUCENE-510:

Whoa, I think you are correct Mark!

On inspecting my changes here, I think the bulk-merging of stored fields is to blame.  Specifically,
when we bulk merge the stored fields we fail to check whether the segments being merged are
the pre-UTF8 format.  And so that code bulk-copies stored fields in the older format into
a file that claims it's using the newer format.

This only affects trunk, not 2.3.

Thanks for being such a brave early-adopter trunk tester, Mark.  And, sorry :(

> IndexOutput.writeString() should write length in bytes
> ------------------------------------------------------
>                 Key: LUCENE-510
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 2.1
>            Reporter: Doug Cutting
>            Assignee: Michael McCandless
>             Fix For: 2.4
>         Attachments: LUCENE-510.patch, LUCENE-510.take2.patch,, strings.diff,
> We should change the format of strings written to indexes so that the length of the string
is in bytes, not Java characters.  This issue has been discussed at:
> We must increment the file format number to indicate this change.  At least the format
number in the segments file should change.
> I'm targetting this for 2.1, i.e., we shouldn't commit it to trunk until after 2.0 is
released, to minimize incompatible changes between 1.9 and 2.0 (other than removal of deprecated

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message