lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2019) map unicode process-internal codepoints to replacement character
Date Thu, 29 Oct 2009 21:19:59 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771623#action_12771623
] 

Robert Muir commented on LUCENE-2019:
-------------------------------------

I think this code won't be so intrusive or hairy.
Here is the list in surrogate pair representation.
Note that for the > BMP points, the trail surrogate is always U+DFFE or U+DFFF

BMP points:
{noformat}
\uFDD0-\uFDEF
\uFFFE
\uFFFF <-- already handled
{noformat}

> BMP points:
{noformat}
\uD83F\uDFFE
\uD83F\uDFFF
\uD87F\uDFFE
\uD87F\uDFFF
\uD8BF\uDFFE
\uD8BF\uDFFF
\uD8FF\uDFFE
\uD8FF\uDFFF
\uD93F\uDFFE
\uD93F\uDFFF
\uD97F\uDFFE
\uD97F\uDFFF
\uD9BF\uDFFE
\uD9BF\uDFFF
\uD9FF\uDFFE
\uD9FF\uDFFF
\uDA3F\uDFFE
\uDA3F\uDFFF
\uDA7F\uDFFE
\uDA7F\uDFFF
\uDABF\uDFFE
\uDABF\uDFFF
\uDAFF\uDFFE
\uDAFF\uDFFF
\uDB3F\uDFFE
\uDB3F\uDFFF
\uDB7F\uDFFE
\uDB7F\uDFFF
\uDBBF\uDFFE
\uDBBF\uDFFF
\uDBFF\uDFFE
\uDBFF\uDFFF
{noformat}


> map unicode process-internal codepoints to replacement character
> ----------------------------------------------------------------
>
>                 Key: LUCENE-2019
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2019
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Robert Muir
>            Priority: Minor
>
> A spinoff from LUCENE-2016.
> There are several process-internal codepoints in unicode, we should not store these in
the index.
> Instead they should be mapped to replacement character (U+FFFD), so they can be used
process-internally.
> An example of this is how Lucene Java currently uses U+FFFF process-internally, it can't
be in the index or will cause problems. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message