lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2094) Prepare CharArraySet for Unicode 4.0
Date Mon, 30 Nov 2009 23:20:20 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783932#action_12783932
] 

Robert Muir commented on LUCENE-2094:
-------------------------------------

bq. But if this is so, you should have initialized the stop filter in persian analyzer with
a fixed "false". Bt it also used StopFilter.getEnablePositionIncrementsVersionDefault() and
used the version default. Should we fix this?

I don't think so. I think its up to the user to decide how they want the search to work, even
in this example.
If they don't like the defaults for how phrasequery works, they can create an analyzer that
uses the stopfilter differently.

I don't think the issue is clear for any given language, I think it always depends on how
your application works.
I mean we add a hole for "the" in english, but in bulgarian (LUCENE-2062) this is a suffix
attached to the end of a noun.
With arabic its always a prefix. I don't think we need to have options to add a posinc gap
if we stem leading ال off an arabic word.

I'm just trying to show some examples of why a user might want to change the defaults.


> Prepare CharArraySet for Unicode 4.0
> ------------------------------------
>
>                 Key: LUCENE-2094
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2094
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 3.0
>            Reporter: Simon Willnauer
>            Assignee: Uwe Schindler
>             Fix For: 3.1
>
>         Attachments: LUCENE-2094.patch, LUCENE-2094.patch, LUCENE-2094.patch, LUCENE-2094.patch,
LUCENE-2094.patch, LUCENE-2094.patch, LUCENE-2094.txt, LUCENE-2094.txt, LUCENE-2094.txt
>
>
> CharArraySet does lowercaseing if created with the correspondent flag. This causes that
 String / char[] with uncode 4 chars which are in the set can not be retrieved in "ignorecase"
mode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message