lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marvin Humphrey (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-510) IndexOutput.writeString() should write length in bytes
Date Mon, 08 May 2006 21:03:22 GMT
    [ http://issues.apache.org/jira/browse/LUCENE-510?page=comments#action_12378519 ] 

Marvin Humphrey commented on LUCENE-510:
----------------------------------------

The following patch...

  * Changes Lucene to use bytecounts as the prefix to all written Strings
  * Changes Lucene to write standard UTF-8 rather than Modified UTF-8 
  * Adds the new test classes MockIndexOutput and TestIndexOutput
  * Increases the number of tests in TestIndexInput

It also slows Lucene down -- indexing takes around a 20% speed hit.  It would be possible
to submit a patch which had a smaller impact on performance, but this one is already over
700 lines long, and it's goal is to achieve standard UTF-8 compliance and modify the definition
of Lucene strings as simply and reliably as possible.  Optimization patches can now be submitted
which build upon this one.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



> IndexOutput.writeString() should write length in bytes
> ------------------------------------------------------
>
>          Key: LUCENE-510
>          URL: http://issues.apache.org/jira/browse/LUCENE-510
>      Project: Lucene - Java
>         Type: Improvement

>   Components: Store
>     Versions: 2.1
>     Reporter: Doug Cutting
>      Fix For: 2.1

>
> We should change the format of strings written to indexes so that the length of the string
is in bytes, not Java characters.  This issue has been discussed at:
> http://www.mail-archive.com/java-dev@lucene.apache.org/msg01970.html
> We must increment the file format number to indicate this change.  At least the format
number in the segments file should change.
> I'm targetting this for 2.1, i.e., we shouldn't commit it to trunk until after 2.0 is
released, to minimize incompatible changes between 1.9 and 2.0 (other than removal of deprecated
features).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message