lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike L." <javaone...@yahoo.com.INVALID>
Subject WordDelimiterFilterFactory - tokenizer question
Date Sun, 05 Apr 2015 07:39:38 GMT
Solr User Group,
    I have a non-multivalied field with contains stored values similar to this: 

US100AUS100BUS100CUS100-DUS100BBA
My assumption is - If I tokenized with the below fieldType definition, specifically the WDF
-splitOnNumbers and the LowerCaseFilterFactory would have have provided me solr matches on
the following query words:
?q=US 100?q=US100
across on field values. In other words, all US100A, US100B, US100C, US100-D would have matched
and scored against my qf weights. However - I'm not seeing that sort of behavior and have
tried various combinations and starting to question my assumptions on the tokenizer. 

Ideally - I would like to return all values (US100A, US100B, US100C, US100-D) when for example,
q=US100A is searched on this field. 

I know I should probably provide the debugQuery results, but was hoping this was a quick hit
for somebody and also I'm reindexing. WordDelimiterFilterFactory doesn't seem to be working
as expected. Hoping to get some clarification or if something sticks out here.

Below is the field type definition being used:
 <fieldType name="field_tokenized" class="solr.TextField" omitNorms="true">
       <analyzer type="index">
        <tokenizer  class="solr.WhitespaceTokenizerFactory"/>
          <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
expand="true"/>
         <filter class="solr.TrimFilterFactory"/>
         <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.WordDelimiterFilterFactory" splitOnNumerics="1" preserveOriginal="1"
generateWordParts="0" generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0"/>
       </analyzer>
     
      <analyzer type="query">
        <tokenizer  class="solr.WhitespaceTokenizerFactory"/>
          <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
expand="true"/>
         <filter class="solr.TrimFilterFactory"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.WordDelimiterFilterFactory" splitOnNumerics="1" 
generateWordParts="0" generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0"/>
     </analyzer>
    </fieldType>


Thanks in advance.
Mike





Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message