lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3748) EnglishPossessiveFilter should work with Unicode right single quotation mark
Date Thu, 02 Feb 2012 21:54:53 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199267#comment-13199267
] 

Robert Muir commented on LUCENE-3748:
-------------------------------------

I think we should do it (despite the cruft).

One of these days we will realize our goal of a stable interface between indexwriter etc and
analyzers such
that if you are really worried about this with old indexes, you just use lucene-analyzers-ancient-version.jar
and it works with the newer lucene-core.jar

But until then, i think we need it (e.g. we add a deprecated ctor for api compatibility that
forwards to VERSION.LUCENE_35)
and conditionalize the handling based on Version.

If you dont want to cruft-it-up lemme know, otherwise feel free to add a patch :)

                
> EnglishPossessiveFilter should work with Unicode right single quotation mark
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-3748
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3748
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 3.1, 3.2, 3.4, 3.5
>            Reporter: David Croley
>            Priority: Minor
>         Attachments: LucenePatch
>
>
> The current EnglishPossessiveFilter (used in EnglishAnalyzer) removes possessives using
only the '\'' character (plus 's' or 'S'), but some common systems (German?) insert the Unicode
"\u2019" (RIGHT SINGLE QUOTATION MARK) instead and this is not removed when processing UTF-8
text. I propose to change EnglishPossesiveFilter to support '\u2019' as an alternative to
'\''.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message