lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul R <>
Subject Re: Trouble Configuring WordDelimiterFilterFactory
Date Thu, 26 Nov 2009 05:53:07 GMT
Would really appreciate any inputs/suggestions on this. Thank you.

On Tue, Nov 24, 2009 at 10:59 PM, Rahul R <> wrote:

> Hello,
> In our application we have a catch-all field (the 'text' field) which is
> cofigured as the default search field. Now this field will have a
> combination of numbers, alphabets, special characters etc. I have a
> requirement wherein the WordDelimiterFilterFactory does not work on numbers,
> especially those with decimal points. Accuracy of results with relevance to
> numerical data is quite important, So if the text field of a document has
> data like "Bridge-Diode 3.55 Volts", I want to make sure that a search for
> "355" or "35.5" does not retrieve this document. So I found the following
> setting for the WordDelimiterFilterFactory to work for me (for most parts):
> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="0" catenateWords="1" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
> preserveOriginal="1"/>
> I am using the same setting for both index and query.
> Now the only problem is, if I have data like ".355". With the above
> setting, the analysis jsp shows me that WordDelimiterFilterFactory is
> creating term texts as both ".355' and "355". So a search for ".355"
> retrieves documents containing both ".355" and "355". A search for "355"
> also has the same effect. I noticed that when the entry for the
> WordDelimiterFilterFactory was completely removed (both index and query),
> then the above problem was resolved. But this seems too harsh a measure.
> Is there a way by which I can prevent the WordDelimiterFilterFactory from
> totally acting on numerical data ?
> Regards
> Rahul

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message