lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3748) EnglishPossessiveFilter should work with Unicode right single quotation mark
Date Fri, 03 Feb 2012 00:20:55 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199398#comment-13199398
] 

Robert Muir commented on LUCENE-3748:
-------------------------------------

Walter: U+2019 does not decompose at all (see http://unicode.org/cldr/utility/character.jsp?a=2019&B1=Show)

This is because its not a compatibility character of any reason, in fact its the single quote
(U+0027) 
thats ambiguous, U+2019 is the correct one here.

>From a pedantic point of view, we should be forcing you to disambiguate the very ambiguous
single quote (U+0027)
on your keyboard and *ONLY* handling U+2019 in this filter, but I realize some people might
find this opinion a 
tad extreme :)



                
> EnglishPossessiveFilter should work with Unicode right single quotation mark
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-3748
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3748
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 3.1, 3.2, 3.4, 3.5
>            Reporter: David Croley
>            Assignee: Robert Muir
>            Priority: Minor
>         Attachments: LucenePatch, Patch-Lucene-3748
>
>
> The current EnglishPossessiveFilter (used in EnglishAnalyzer) removes possessives using
only the '\'' character (plus 's' or 'S'), but some common systems (German?) insert the Unicode
"\u2019" (RIGHT SINGLE QUOTATION MARK) instead and this is not removed when processing UTF-8
text. I propose to change EnglishPossesiveFilter to support '\u2019' as an alternative to
'\''.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message