lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Rowe (Commented) (JIRA)" <>
Subject [jira] [Commented] (LUCENE-3748) EnglishPossessiveFilter should work with Unicode right single quotation mark
Date Thu, 02 Feb 2012 20:49:00 GMT


Steven Rowe commented on LUCENE-3748:

+1, and +1 to include U+FF07.

There are several other characters listed with U+0027 APOSTROPHE in
that could be interpreted visually as an English apostrophe, e.g. U+02BC MODIFIER LETTER APOSTROPHE,
but it would be unusual for people to use those characters as apostrophes in English text,
so I think it would be fine to exclude them.  (By contrast, the Unicode standard says that
U+2019 is the *preferred* apostrophe form.)
> EnglishPossessiveFilter should work with Unicode right single quotation mark
> ----------------------------------------------------------------------------
>                 Key: LUCENE-3748
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 3.1, 3.2, 3.4, 3.5
>            Reporter: David Croley
>            Priority: Minor
>         Attachments: LucenePatch
> The current EnglishPossessiveFilter (used in EnglishAnalyzer) removes possessives using
only the '\'' character (plus 's' or 'S'), but some common systems (German?) insert the Unicode
"\u2019" (RIGHT SINGLE QUOTATION MARK) instead and this is not removed when processing UTF-8
text. I propose to change EnglishPossesiveFilter to support '\u2019' as an alternative to

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message