lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-510) IndexOutput.writeString() should write length in bytes
Date Tue, 04 Mar 2008 20:57:40 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael McCandless updated LUCENE-510:
--------------------------------------

    Attachment: LUCENE-510.patch

Attached patch.

I modernized Marvin's original patch and added full backwards
compatibility to it so that old indices can be opened for reading or
writing.  New segments are written in the new format.

All tests pass.  I think it's close, but, I need to run performance
tests now to measure the impact to indexing throughput.

I think future optimizations can keep the byte[] further, eg, into
Term and FieldCache, as Yonik mentioned.  We could also fix
DocumentsWriter to use byte[] for its terms storage which would
improve RAM efficiency for single-byte (ascii) content.

I also updated the TestBackwardsCompatibility testcase to properly
test non-ascii terms.



> IndexOutput.writeString() should write length in bytes
> ------------------------------------------------------
>
>                 Key: LUCENE-510
>                 URL: https://issues.apache.org/jira/browse/LUCENE-510
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 2.1
>            Reporter: Doug Cutting
>            Assignee: Michael McCandless
>         Attachments: LUCENE-510.patch, SortExternal.java, strings.diff, TestSortExternal.java
>
>
> We should change the format of strings written to indexes so that the length of the string
is in bytes, not Java characters.  This issue has been discussed at:
> http://www.mail-archive.com/java-dev@lucene.apache.org/msg01970.html
> We must increment the file format number to indicate this change.  At least the format
number in the segments file should change.
> I'm targetting this for 2.1, i.e., we shouldn't commit it to trunk until after 2.0 is
released, to minimize incompatible changes between 1.9 and 2.0 (other than removal of deprecated
features).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message