lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <>
Subject [jira] Commented: (LUCENE-2094) Prepare CharArraySet for Unicode 4.0
Date Sun, 29 Nov 2009 12:02:20 GMT


Simon Willnauer commented on LUCENE-2094:

bq. Why do you use Version.LUCENE_CURRENT for all predefined stop word sets (ok, they do not
need a match version, because they are already lowercased). 

1. the do not ignore case at all so the version will not affect those sets.
2. they are private and we have the full control over the sets. The are all lowercased (as
you figured correctly) and none of them contains any supplementary character.
3. The are static and private so passing any usersupplied version is not feasible.

bq. In my opinion the whole stuff is only needed for chararrayssets, which are not already
lowercased. So is there any chararrayset in lucene with predefined stop-words, that is not
Either way, if the set is lowercased or not the lowercaseing is also applied to the values
checked against the set.

> Prepare CharArraySet for Unicode 4.0
> ------------------------------------
>                 Key: LUCENE-2094
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 1.9, 2.0.0, 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.3.3, 2.4, 2.4.1, 2.4.2,
2.9, 2.9.1, 2.9.2, 3.0, 3.0.1, 3.1
>            Reporter: Simon Willnauer
>             Fix For: 3.1
>         Attachments: LUCENE-2094.patch, LUCENE-2094.txt, LUCENE-2094.txt, LUCENE-2094.txt
> CharArraySet does lowercaseing if created with the correspondent flag. This causes that
 String / char[] with uncode 4 chars which are in the set can not be retrieved in "ignorecase"

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message