lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charles Riley <zenodo...@gmail.com>
Subject Re: [solrmarc-tech] apostrophe / ayn / alif
Date Thu, 24 May 2012 16:30:41 GMT
Hi Naomi,

I don't have a conclusive answer for you on this yet, but let me pick up on
a few points.

First, the apostrophe is probably being handled through ignoring
punctuation in the ICUCollationKeyFilterFactory.

Alif isn't a diacritic but a letter, and its character properties would be
handled as such, apparently also outside the scope of what the folding
filter factory does unless it's tailored.

>From the solrwiki, this looks like a helpful rule of thumb:

"When To use a CharFilter vs a TokenFilter

There are several pairs of CharFilters and TokenFilters that have related
(ie: MappingCharFilter and ASCIIFoldingFilter) or nearly identical
functionality (ie: PatternReplaceCharFilterFactory and
PatternReplaceFilterFactory) and it may not always be obvious which is the
best choice.

The ultimate decision depends largely on what Tokenizer you are using, and
whether you need to "out smart" it by preprocessing the stream of
characters.

For example, maybe you have a tokenizer such as StandardTokenizer and you
are pretty happy with how it works overall, but you want to customize how
some specific characters behave.
In such a situation you could modify the rules and re-build your own
tokenizer with javacc, but perhaps its easier to simply map some of the
characters before tokenization with a CharFilter."


Charles

On Tue, May 15, 2012 at 2:47 PM, Naomi Dushay <ndushay@stanford.edu> wrote:

> We are using the ICUFoldingFilterFactory with great success to fold
> diacritics so searches with and without the diacritics get the same results.
>
> We recently discovered we have some Korean records that use an alif
> diacritic instead of an apostrophe, and this diacritic is NOT getting
> folded.   Has anyone experienced this for alif or ayn characters?   Do you
> have a solution?
>
>
> - Naomi
>
> --
> You received this message because you are subscribed to the Google Groups
> "solrmarc-tech" group.
> To post to this group, send email to solrmarc-tech@googlegroups.com.
> To unsubscribe from this group, send email to
> solrmarc-tech+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/solrmarc-tech?hl=en.
>
>


-- 
*Charles L. Riley*
*Catalog Librarian for Africana*
*Sterling Memorial Library, Yale University*
*<**zenodotus@gmail.com* <zenodotus@gmail.com>*>*
*203-432-7566*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message