lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Alphanumeric wildcard search problem
Date Wed, 01 Sep 2010 23:58:17 GMT
Oh dear. Wildcard queries aren't analyzed, so I suspect it's a casing issue.

Try two things:
1> search for r-1*
2> look in your index and be sure the actual terms are there as you expect.

HTH
Erick

On Wed, Sep 1, 2010 at 4:35 PM, Hasnain <hasn_36@hotmail.com> wrote:

>
> Thankyou for your suggestions
>
> when before removing the wordDelimiterFilterFactory, the results for q=R-*
> returned perfect results but not for q=R-1*, also after removing
> wordDelimiterFilterFactory, it didnt bring me results for q=R-*
>
> the results before removing wordDelimiterFilterFactory using debugQuery=on
> were
>
> <response>
> −
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">78</int>
> −
> <lst name="params">
> <str name="debugQuery">on</str>
> <str name="fl">mat_nr</str>
> <str name="q">R-1*</str>
> <str name="qt">standard2</str>
> </lst>
> </lst>
> <result name="response" numFound="0" start="0"/>
> −
> <lst name="debug">
> <str name="rawquerystring">R-1*</str>
> <str name="querystring">R-1*</str>
> −
> <str name="parsedquery">
> +DisjunctionMaxQuery((ext_quote_id:r-1*^0.4 | vendor_mat:r-1*^0.4 |
> description:r-1*^0.4 | prod_id:r-1*^0.4 | name:r-1*^2.3 | mat_nr:R-1*^0.4 |
> manufact_mat:r-1*^0.4)~0.6) ()
> </str>
> −
> <str name="parsedquery_toString">
> +(ext_quote_id:r-1*^0.4 | vendor_mat:r-1*^0.4 | description:r-1*^0.4 |
> prod_id:r-1*^0.4 | name:r-1*^2.3 | mat_nr:R-1*^0.4 |
> manufact_mat:r-1*^0.4)~0.6 ()
> </str>
> <lst name="explain"/>
> <str name="QParser">DisMaxQParser</str>
> <null name="altquerystring"/>
> <null name="boostfuncs"/>
> −
> <lst name="timing">
> <double name="time">31.0</double>
> −
> <lst name="prepare">
> <double name="time">15.0</double>
> −
> <lst name="org.apache.solr.handler.component.QueryComponent">
> <double name="time">15.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.FacetComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.HighlightComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.StatsComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.DebugComponent">
> <double name="time">0.0</double>
> </lst>
> </lst>
> −
> <lst name="process">
> <double name="time">16.0</double>
> −
> <lst name="org.apache.solr.handler.component.QueryComponent">
> <double name="time">16.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.FacetComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.HighlightComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.StatsComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.DebugComponent">
> <double name="time">0.0</double>
> </lst>
> </lst>
> </lst>
> </lst>
> </response>
>
> and after removing wordDelimiterFilterFactory
>
> <response>
> −
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">78</int>
> −
> <lst name="params">
> <str name="debugQuery">on</str>
> <str name="fl">mat_nr</str>
> <str name="q">R-1*</str>
> <str name="qt">standard2</str>
> </lst>
> </lst>
> <result name="response" numFound="0" start="0"/>
> −
> <lst name="debug">
> <str name="rawquerystring">R-1*</str>
> <str name="querystring">R-1*</str>
> −
> <str name="parsedquery">
> +DisjunctionMaxQuery((ext_quote_id:r-1*^0.4 | vendor_mat:r-1*^0.4 |
> description:r-1*^0.4 | prod_id:r-1*^0.4 | name:r-1*^2.3 | mat_nr:R-1*^0.4 |
> manufact_mat:r-1*^0.4)~0.6) ()
> </str>
> −
> <str name="parsedquery_toString">
> +(ext_quote_id:r-1*^0.4 | vendor_mat:r-1*^0.4 | description:r-1*^0.4 |
> prod_id:r-1*^0.4 | name:r-1*^2.3 | mat_nr:R-1*^0.4 |
> manufact_mat:r-1*^0.4)~0.6 ()
> </str>
> <lst name="explain"/>
> <str name="QParser">DisMaxQParser</str>
> <null name="altquerystring"/>
> <null name="boostfuncs"/>
> −
> <lst name="timing">
> <double name="time">31.0</double>
> −
> <lst name="prepare">
> <double name="time">15.0</double>
> −
> <lst name="org.apache.solr.handler.component.QueryComponent">
> <double name="time">15.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.FacetComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.HighlightComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.StatsComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.DebugComponent">
> <double name="time">0.0</double>
> </lst>
> </lst>
> −
> <lst name="process">
> <double name="time">16.0</double>
> −
> <lst name="org.apache.solr.handler.component.QueryComponent">
> <double name="time">16.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.FacetComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.HighlightComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.StatsComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.DebugComponent">
> <double name="time">0.0</double>
> </lst>
> </lst>
> </lst>
> </lst>
> </response>
>
> also at first the wordDelimiterFilterFactory used was this
> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="1"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>
> before removing wordDelimiterFilterFactory, solr admin showed
>
> Index Analyzer
> org.apache.solr.analysis.WhitespaceTokenizerFactory {}
> term position   1
> term text       R-1110
> term type       word
> source start,end        0,6
> payload
> org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
> ignoreCase=true, enablePositionIncrements=true}
> term position   1
> term text       R-1110
> term type       word
> source start,end        0,6
> payload
> org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1,
> generateNumberParts=1, catenateWords=1, generateWordParts=1, catenateAll=0,
> catenateNumbers=1}
> term position   1       2
> term text       R       1110
> term type       word    word
> source start,end        0,1     2,6
> payload
> org.apache.solr.analysis.LowerCaseFilterFactory {}
> term position   1       2
> term text       r       1110
> term type       word    word
> source start,end        0,1     2,6
> payload
> org.apache.solr.analysis.EnglishPorterFilterFactory
> {protected=protwords.txt}
> term position   1       2
> term text       r       1110
> term type       word    word
> source start,end        0,1     2,6
> payload
> org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
> term position   1       2
> term text       r       1110
> term type       word    word
> source start,end        0,1     2,6
> payload
>
>
>
> also after removing wordDelimiterFilterFactory,solr admin looks like this
>
> Index Analyzer
> org.apache.solr.analysis.WhitespaceTokenizerFactory {}
> term position   1
> term text       R-1110
> term type       word
> source start,end        0,6
> payload
> org.apache.solr.analysis.StopFilterFactory {words=stopwords.txt,
> ignoreCase=true, enablePositionIncrements=true}
> term position   1
> term text       R-1110
> term type       word
> source start,end        0,6
> payload
> org.apache.solr.analysis.EnglishPorterFilterFactory
> {protected=protwords.txt}
> term position   1
> term text       R-1110
> term type       word
> source start,end        0,6
> payload
> org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
> term position   1
> term text       R-1110
> term type       word
> source start,end        0,6
> payload
>
>
> any suggestions?
>
> thankyou
>
>
> Erick Erickson wrote:
> >
> > Really look at the analysis page in solr admin for how your
> > analyzer chain handles things, or you'll spend time until you're
> > really old having trouble :).
> >
> > Here's what I see on a quick scan:
> >
> >> StandardTokenizer tries to, among other things, preserve
> > email addresses. The kinds of strings you're working with may
> > trip something up here.
> >
> >> Remove WordDelimiterFactory altogether. The point of WDF
> > is to break words apart at transitions.
> >
> >> Remove EnglishPorterFilterFactory too. What the effect
> > of applying an algorithmic stemming process to words like
> > you're interested in is...er...not obvious.
> >
> > All that said, I took a quick at the analysis page with your definition
> > and nothing jumped out at me. Are you sure that:
> >> you're getting to the request handler you think? What does adding
> > &debugQuery=on show?
> >> you've indexed the data after you've made the changes you outlined
> above?
> > The SOLR
> > admin page can help here, especially the [full interface] link, with
> debug
> > info on.
> >
> > If nothing shows up, can you post the results of &debugQuery=on?
> >
> > Best
> > Erick
> >
> > On Tue, Aug 31, 2010 at 6:11 AM, Hasnain <hasn_36@hotmail.com> wrote:
> >
> >>
> >> I have gone through all the of the related posts, but could not find a
> >> proper
> >> answer that works, so Im writing this post
> >>
> >> Is there anyway of using wilcard searches on alphanumeric text
> >> like...R-1*
> >> ?
> >>
> >> let me share relevent information
> >>
> >>
> >> <fieldType name="textShoaib" class="solr.TextField"
> >> positionIncrementGap="100">
> >>      <analyzer type="index">
> >>        <tokenizer class="solr.StandardTokenizerFactory"/>   <!--This
was
> >> originally <tokenizer class="solr.WhitespaceTokenizerFactory"/> just
> >> playing
> >> around-->
> >>        <filter class="solr.StopFilterFactory"
> >>                ignoreCase="true"
> >>                words="stopwords.txt"
> >>                enablePositionIncrements="true"
> >>                />
> >>        <filter class="solr.WordDelimiterFilterFactory"
> >> generateWordParts="0" generateNumberParts="0" catenateWords="0"
> >> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"
> >> preserveOriginal="1"/>
> >>        <filter class="solr.LowerCaseFilterFactory"/>
> >>        <filter class="solr.EnglishPorterFilterFactory"
> >> protected="protwords.txt"/>
> >>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >>      </analyzer>
> >>      <analyzer type="query">
> >>        <tokenizer class="solr.StandardTokenizerFactory"/>
> >> <!--This was originally <tokenizer
> >> class="solr.WhitespaceTokenizerFactory"/>-->
> >>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> >> ignoreCase="true" expand="true"/>
> >>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> >> words="stopwords.txt"/>
> >>        <filter class="solr.WordDelimiterFilterFactory"
> >> generateWordParts="0" generateNumberParts="0" catenateWords="0"
> >> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"
> >> preserveOriginal="1"/>
> >>        <filter class="solr.LowerCaseFilterFactory"/>
> >>        <filter class="solr.EnglishPorterFilterFactory"
> >> protected="protwords.txt"/>
> >>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >>      </analyzer>
> >>    </fieldType>
> >>
> >>
> >>
> >>
> >> my requestHandler is...
> >>
> >>
> >>
> >>
> >>
> >>  <requestHandler name="standard2" class="solr.SearchHandler">
> >>    <!-- default values for query parameters -->
> >>     <lst name="defaults">
> >>  <str name="defType">dismax</str>
> >>       <str name="echoParams">explicit</str>
> >>  <str name="tie">0.6</str>
> >>  <str name="pf">name^2.3 mat_nr^0.4</str>
> >>  <str name="mm">0%</str>
> >>       <!--
> >>       <int name="rows">10</int>
> >>       <str name="fl">*</str>
> >>       <str name="version">2.1</str>
> >>        -->
> >>     </lst>
> >>
> >>  </requestHandler>
> >>
> >>
> >>
> >> and also the field on which I want to apply searching on
> >>
> >>
> >>
> >>  <field name="mat_nr"  type="textShoaib" indexed="true" stored="true"
> >> omitNorms="true"/>
> >>
> >>
> >>
> >> and the query Im using is
> >>
> >>
> >>
> >> qt=standard2&q=R-1*
> >>
> >>
> >>
> >> but this still doesnt work.
> >>
> >>
> >> any suggestions on this?
> >>
> >> thanks
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/Alphanumeric-wildcard-search-problem-tp1393332p1393332.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >
> >
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Alphanumeric-wildcard-search-problem-tp1393332p1402772.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message