lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Rowe (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1390) add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter
Date Fri, 19 Sep 2008 14:52:44 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632706#action_12632706
] 

Steven Rowe commented on LUCENE-1390:
-------------------------------------

bq. The Extended-C and D blocks also have relevant things to include

These two blocks were not included in Unicode 3.0, the version supported by Java 1.4.2, which
is the Java version that Lucene 2.X supports.

Nevertheless, the ranges these two blocks occupy in Unicode 5.1 are non-characters in Unicode
3.0, so I don't think it would be a problem to add them.

I'll take a look at adding more stuff this weekend.

I also will add the Unicode character descriptions to the comments for each character (e.g.
"LATIN CAPITAL LETTER A WITH MACRON").

> add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter
> ------------------------------------------------------------
>
>                 Key: LUCENE-1390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1390
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>         Environment: any
>            Reporter: Andi Vajda
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ISOLatinAccentFilter.java
>
>
> The ISOLatin1AccentFilter is removing accents from accented characters in the ISO Latin
1 character set.
> It does what it does and there is no bug with it.
> It would be nicer, though, if there was a more comprehensive version of this code that
included not just ISO-Latin-1 (ISO-8859-1) but the entire Latin 1 and Latin Extended A unicode
blocks.
> See: http://en.wikipedia.org/wiki/Latin-1_Supplement_unicode_block
> See: http://en.wikipedia.org/wiki/Latin_Extended-A_unicode_block
> That way, all languages using roman characters are covered.
> A new class, ISOLatinAccentFilter is attached. It is intended to supercede ISOLatin1AccentFilter
which should get deprecated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message