lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (Commented) (JIRA)" <>
Subject [jira] [Commented] (LUCENE-3748) EnglishPossessiveFilter should work with Unicode right single quotation mark
Date Thu, 02 Feb 2012 21:02:54 GMT


Robert Muir commented on LUCENE-3748:

Thats my thoughts exactly Steven.

I think by default we should go with U+0027 and U+2019 (and as i mentioned, either FF07 or
not, its less important). 

As far as other look-alikes, sure it could happen, BUT the user could just place ASCIIFoldingFilter
EnglishPossessiveFilter if they want that more brutal behavior... thats a more lossy normalization
that I 
don't think we should do by default...
> EnglishPossessiveFilter should work with Unicode right single quotation mark
> ----------------------------------------------------------------------------
>                 Key: LUCENE-3748
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 3.1, 3.2, 3.4, 3.5
>            Reporter: David Croley
>            Priority: Minor
>         Attachments: LucenePatch
> The current EnglishPossessiveFilter (used in EnglishAnalyzer) removes possessives using
only the '\'' character (plus 's' or 'S'), but some common systems (German?) insert the Unicode
"\u2019" (RIGHT SINGLE QUOTATION MARK) instead and this is not removed when processing UTF-8
text. I propose to change EnglishPossesiveFilter to support '\u2019' as an alternative to

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message