lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matteo Diarena" <>
Subject Fuzzy Search issues using Solr 4.0
Date Mon, 02 Jul 2012 13:39:01 GMT
Dear Solr Users,

I'm an enthusiastic solr user since version 1.4. I'm now working on a new
solr based application heavily using fuzzy searches for string matching.

Unfortunately I'm facing a strange problem using fuzzy search and I hope
someone can help me to get more information.


I indexed several company names in a field named ENTITY_NAME using the
following parameters in schema.xml



                <fieldType name="whitespace_tokenized"


class="solr.WhitespaceTokenizerFactory" />





<field name="ENTITY_NAME" type="whitespace_tokenized" indexed="true"
stored="true" />



One of these companies is "TS PUBLISHING INC"

Following the list of queries with the returned and the expected result

1)      ENTITY_NAME:(ts AND publising)           => matches, OK

2)      ENTITY_NAME:(ts AND publising~1)      => matches, OK

3)      ENTITY_NAME:(td~1 AND publishing)  => doesn't match, KO (it was
supposed to match)

4)      ENTITY_NAME:(ts AND pablisin~3)        => doesn't match, KO (it was
supposed to match)


Why td~1 does not match ts?

Why pablisin~3 publishing?


How can I investigate the problem? 

Is there any parameter I can set in solrconfig.xml? 

Is there any tool I can use to see how the automata is built?


Thanks a lot in advance,

Matteo Diarena
Senior KM Developer - S.r.l.
Via Luigi Rizzo, 8/1 - 20151 MILANO
Fax  +39 02 8945 3500

Tel  +39 02 8945 3023
Cell +39 345 2129244


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message