lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marvin Humphrey (JIRA)" <>
Subject [jira] Commented: (LUCENE-510) IndexOutput.writeString() should write length in bytes
Date Mon, 08 May 2006 21:03:22 GMT
    [ ] 

Marvin Humphrey commented on LUCENE-510:

The following patch...

  * Changes Lucene to use bytecounts as the prefix to all written Strings
  * Changes Lucene to write standard UTF-8 rather than Modified UTF-8 
  * Adds the new test classes MockIndexOutput and TestIndexOutput
  * Increases the number of tests in TestIndexInput

It also slows Lucene down -- indexing takes around a 20% speed hit.  It would be possible
to submit a patch which had a smaller impact on performance, but this one is already over
700 lines long, and it's goal is to achieve standard UTF-8 compliance and modify the definition
of Lucene strings as simply and reliably as possible.  Optimization patches can now be submitted
which build upon this one.

Marvin Humphrey
Rectangular Research

> IndexOutput.writeString() should write length in bytes
> ------------------------------------------------------
>          Key: LUCENE-510
>          URL:
>      Project: Lucene - Java
>         Type: Improvement

>   Components: Store
>     Versions: 2.1
>     Reporter: Doug Cutting
>      Fix For: 2.1

> We should change the format of strings written to indexes so that the length of the string
is in bytes, not Java characters.  This issue has been discussed at:
> We must increment the file format number to indicate this change.  At least the format
number in the segments file should change.
> I'm targetting this for 2.1, i.e., we shouldn't commit it to trunk until after 2.0 is
released, to minimize incompatible changes between 1.9 and 2.0 (other than removal of deprecated

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message