lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Tobias" <mich...@tobias.org.uk>
Subject ASCIIFoldingFilterFactory
Date Fri, 06 Jun 2014 00:05:23 GMT
Hi there

I am a relative newbie Solr user so please be gentle with me.

I am experimenting with various phonetic filters and the tokens created can
vary depending on whether the words contain diacritical characters.

My problem is that the documents being indexed are not always consistent in
terms of the use of diacritics (sometimes the same word can have diacritics
and sometimes not) and of course when users submit  queries they may or may
not use diacritics properly.

If I wasn't trying to use phonetic matching I would simply use the
ASCIIFoldingFilterFactory to remove any problem characters and match on
that.

What I would like to do is create phonetic tokens for both the
diacritic-version of the word and the folded-version of the word - but I
would like to store the tokens in a single phonetic field for querying
purposes.....

How can I achieve that????

I did find a few references online to "ASCIIFoldingExpansionFilterFactory"
which appears to do what I want - when creating the 'folded' version of a
word it appears to keep the diacritic version too. I could then apply my
phonetic filter to those expanded tokens.

Is there any other way to do this?  Or if ASCIIFoldingExpansionFilterFactory
is the only or easiest way to do the job can somebody tell me HOW to
incorporate that into my Solr setup????

Many thanks!!

Michael


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message