lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Kangas <kan...@gmail.com>
Subject Re: Issue with 2WD and 4WD in query
Date Mon, 10 Dec 2007 17:31:25 GMT
I suppose you'll have to take WordDelimiterFilter out of your analysis  
chain, at least for that field. Or, perhaps toggling the  
"generateNumberParts" argument will have some effect? The API  
documentation should be your best resource here...

--matt

On Dec 10, 2007, at 11:48 AM, Brendan Grainger wrote:

> Hi Matt,
>
> Thanks for the reply. I've done what you said and I get exactly what  
> you're saying as a result. Any ideas about how to make 2WD and 4WD  
> be terms on their own?
>
> THanks
>
> On Dec 10, 2007, at 11:41 AM, Matt Kangas wrote:
>
>> Brendan, pull up your Solr Admin "Analysis" page and try running  
>> your queries through that. The output will tell you precisely how  
>> each analyzer affects your tokens on either the index or query side.
>>
>> In my own quick test, WordDelimiterFilterFactory seems inclined to  
>> break "2WD" into ("2","WD")
>>
>> (using org.apache.solr.analysis.WordDelimiterFilterFactory  
>> {catenateWords=1, catenateNumbers=1, catenateAll=0,  
>> generateNumberParts=1, generateWordParts=1})
>>
>> --matt
>>
>> On Dec 9, 2007, at 6:41 PM, Brendan Grainger wrote:
>>
>>> Hi,
>>>
>>> I hope you can help me. I'm having an odd problem with solr. I  
>>> have a field that could be represent a car. A car could have a  
>>> name like "Silverado" or could be something like "Silverado 2WD"  
>>> to denote the 2 wheel drive version of the car. Anyway, all is  
>>> well when I search over the field for "Silverado", but when I try  
>>> searching for "2WD" (doesn't matter what case) nothing is  
>>> returned. Same applies for "Silverado 2WD" etc. I currently have  
>>> the field defined as text, ie:
>>>
>>> <field name="car_name" type="text" indexed="true" stored="true" />
>>>
>>> But I've also tried defining my own (simpler) field with no luck.  
>>> FYI my text field is defined like this:
>>>
>>>   <fieldType name="text" class="solr.TextField"  
>>> positionIncrementGap="100">
>>>     <analyzer type="index">
>>>     	<!-- This is supposed to remove HTML tags before indexing -->
>>>     	<tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/>
>>>     	<!--
>>>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>        -->
>>>       <filter class="solr.StopFilterFactory" ignoreCase="true"  
>>> words="stopwords.txt"/>
>>>       <filter class="solr.WordDelimiterFilterFactory"  
>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"  
>>> catenateNumbers="1" catenateAll="0"/>
>>>       <filter class="solr.LowerCaseFilterFactory"/>
>>>       <filter class="solr.EnglishPorterFilterFactory"  
>>> protected="protwords.txt"/>
>>>       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>>     </analyzer>
>>>     <analyzer type="query">
>>>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>       <filter class="solr.SynonymFilterFactory"  
>>> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>>>       <filter class="solr.StopFilterFactory" ignoreCase="true"  
>>> words="stopwords.txt"/>
>>>       <filter class="solr.WordDelimiterFilterFactory"  
>>> generateWordParts="1" generateNumberParts="1" catenateWords="0"  
>>> catenateNumbers="0" catenateAll="0"/>
>>>       <filter class="solr.LowerCaseFilterFactory"/>
>>>       <filter class="solr.EnglishPorterFilterFactory"  
>>> protected="protwords.txt"/>
>>>       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>>     </analyzer>
>>>   </fieldType>
>>>
>>> Any help?
>>>
>>> Thanks!
>>> Brendan
>>
>> --
>> Matt Kangas / kangas@gmail.com
>>
>>
>

--
Matt Kangas / kangas@gmail.com



Mime
View raw message