lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marvin Humphrey (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-510) IndexOutput.writeString() should write length in bytes
Date Mon, 05 Jun 2006 01:56:30 GMT
     [ http://issues.apache.org/jira/browse/LUCENE-510?page=all ]

Marvin Humphrey updated LUCENE-510:
-----------------------------------

    Attachment: SortExternal.java
                TestSortExternal.java

Greets,

I've ported KinoSearch's external sorting module to java, along with its tests.  This class
is the linchpin for the KinoSearch merge model, as it allows serialized postings to be dumped
into a sort pool of effectively unlimited size.

At some point, I'll submit patches implementing the KinoSearch merge model in Lucene.  I'm
reasonably confident that it will more than make up for the index-time performance hit caused
by using bytecounts as string headers.

Thematically, this class belongs in org.apache.lucene.util, and that's where I've put it for
now.  The classes that will use it are in org.apache.lucene.index, so if it stays in util,
it will have to be public.  However, it shouldn't be part of Lucene's documented public API.
 The process by which Lucene's docs are generated is not clear to me, so access control advice
would be appreciated.

There are a number of other areas where this code could stand review, especially considering
my relatively limited experience using Java.  I'd single out exception handling and thread
safety, but of course anything else is fair game.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


> IndexOutput.writeString() should write length in bytes
> ------------------------------------------------------
>
>          Key: LUCENE-510
>          URL: http://issues.apache.org/jira/browse/LUCENE-510
>      Project: Lucene - Java
>         Type: Improvement

>   Components: Store
>     Versions: 2.1
>     Reporter: Doug Cutting
>      Fix For: 2.1
>  Attachments: SortExternal.java, TestSortExternal.java, strings.diff
>
> We should change the format of strings written to indexes so that the length of the string
is in bytes, not Java characters.  This issue has been discussed at:
> http://www.mail-archive.com/java-dev@lucene.apache.org/msg01970.html
> We must increment the file format number to indicate this change.  At least the format
number in the segments file should change.
> I'm targetting this for 2.1, i.e., we shouldn't commit it to trunk until after 2.0 is
released, to minimize incompatible changes between 1.9 and 2.0 (other than removal of deprecated
features).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message