lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Walter Underwood (Commented) (JIRA)" <>
Subject [jira] [Commented] (LUCENE-3748) EnglishPossessiveFilter should work with Unicode right single quotation mark
Date Fri, 03 Feb 2012 00:14:54 GMT


Walter Underwood commented on LUCENE-3748:

Why make separate patches for characters instead of using Unicode normalization? Converting
to NFKC would also solve this for the prime character (U+2032) and any other codepoint that
is equivalent.

Compatibility normalization is designed for precisely this purpose, equivalence ignoring appearance.
> EnglishPossessiveFilter should work with Unicode right single quotation mark
> ----------------------------------------------------------------------------
>                 Key: LUCENE-3748
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 3.1, 3.2, 3.4, 3.5
>            Reporter: David Croley
>            Assignee: Robert Muir
>            Priority: Minor
>         Attachments: LucenePatch, Patch-Lucene-3748
> The current EnglishPossessiveFilter (used in EnglishAnalyzer) removes possessives using
only the '\'' character (plus 's' or 'S'), but some common systems (German?) insert the Unicode
"\u2019" (RIGHT SINGLE QUOTATION MARK) instead and this is not removed when processing UTF-8
text. I propose to change EnglishPossesiveFilter to support '\u2019' as an alternative to

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message