lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiroaki Kawai (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1241) 0xffff char is not a string terminator
Date Tue, 25 Mar 2008 04:09:26 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581768#action_12581768
] 

Hiroaki Kawai commented on LUCENE-1241:
---------------------------------------

I think we should not use \uffff as a terminator in Lucene library regardless of the fact
that it is allowed in Unicode standard, because it is unnecessary.

Reading commit log in svn repository, and the code base at revision 553235, I suspect termination
with "\uffff" is introduced at 553236 referring the implementation of java.text.CharacterIterator.
Isn't it? ( java.text.CharacterIterator.DONE is class static and is "\uffff". The class java.text.CharacterIterator
is for supporting internationalization interface of bidirectional string scan. And we can
determine whether we reached the end of a string by comparing what we get with java.text.CharacterIterator.DONE.
)

I came to the idea of introducing a new class that implements CharSequence, Comparable and
has a good hashCode() that will use the buffer of original memory allocation (String, StringBuffer,
char[], CharBuffer, or etc.).

> 0xffff char is not a string terminator
> --------------------------------------
>
>                 Key: LUCENE-1241
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1241
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Hiroaki Kawai
>         Attachments: LUCENE-1241.patch
>
>
> Current trunk index.DocumentWriter uses "\uffff" as a string terminator, but it should
not to be for some reasons. \uffff is not a terminator char itself and we can't handle a string
that really contains \uffff. And also, we can calculate the end char position in a character
sequence from the string length that we already know.
> However, I agree with the usage for assertion, that "\uffff" is placed after at the end
of a string in a char sequence.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message