lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "DM Smith (JIRA)" <>
Subject [jira] Commented: (LUCENE-2102) LowerCaseFilter for Turkish language
Date Tue, 01 Dec 2009 21:01:20 GMT


DM Smith commented on LUCENE-2102:

bq. but non-NFC text doesn't work correctly throughout most of lucene's analysis components
as it is now anyway, so I don't think we should worry about it right now. Maybe we could add
a comment for the future though.

It might be good to note the NFC (NFKC?) requirement in the JavaDoc.

Maybe its just me, but I think it is critical to normalize the input to Lucene for both indexing
and searching. Unless a NFCNormalizingFilter is added to Lucene, I think it is the responsibility
of the caller.

> LowerCaseFilter for Turkish language
> ------------------------------------
>                 Key: LUCENE-2102
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>    Affects Versions: 3.0
>            Reporter: Ahmet Arslan
>            Assignee: Robert Muir
>            Priority: Minor
>             Fix For: 3.1
>         Attachments: LUCENE-2102.patch
> java.lang.Character.toLowerCase() converts 'I' to 'i' however in Turkish alphabet lowercase
of 'I' is not 'i'. It is LATIN SMALL LETTER DOTLESS I.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message