lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Rowe <>
Subject Re: ASCIIFoldingFilterFactory
Date Fri, 06 Jun 2014 00:48:21 GMT
Hi Michael,

Questions about Solr should go to the Solr user mailing list, rather than this list, which
is for Lucene users - see <> for how to

I’ve never heard of ASCIIFoldingExpansionFilterFactory, but ASCIIFoldingFilterFactory has
a new option “preserveOriginal”, introduced in Lucene/Solr 4.7 by LUCENE-5437 <>,
that should do the trick.

Just add preserveOriginal=“true” - see the example in the javadocs (if you copy/paste
it, make sure you change the attribute value from “false”, as it is in the example, to
“true”): <>

Note that as Ahmet Arslan points out on LUCENE-5437, though, queries that generate multiple
terms (e.g. prefix and regex queries) will trigger a failure.  You can work around this problem
by defining both “index" and “query" analyzer types for the fieldtype you use with this
field, and only use preserveOriginal=“true” on the “index” analyzer type.

See this page on the Solr Reference Guide for more info about analyzers in Solr: <>.


On Jun 5, 2014, at 8:05 PM, Michael Tobias <> wrote:

> Hi there
> I am a relative newbie Solr user so please be gentle with me.
> I am experimenting with various phonetic filters and the tokens created can
> vary depending on whether the words contain diacritical characters.
> My problem is that the documents being indexed are not always consistent in
> terms of the use of diacritics (sometimes the same word can have diacritics
> and sometimes not) and of course when users submit  queries they may or may
> not use diacritics properly.
> If I wasn't trying to use phonetic matching I would simply use the
> ASCIIFoldingFilterFactory to remove any problem characters and match on
> that.
> What I would like to do is create phonetic tokens for both the
> diacritic-version of the word and the folded-version of the word - but I
> would like to store the tokens in a single phonetic field for querying
> purposes.....
> How can I achieve that????
> I did find a few references online to "ASCIIFoldingExpansionFilterFactory"
> which appears to do what I want - when creating the 'folded' version of a
> word it appears to keep the diacritic version too. I could then apply my
> phonetic filter to those expanded tokens.
> Is there any other way to do this?  Or if ASCIIFoldingExpansionFilterFactory
> is the only or easiest way to do the job can somebody tell me HOW to
> incorporate that into my Solr setup????
> Many thanks!!
> Michael
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message