lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sunshine glass <sunshineglassof2...@gmail.com>
Subject Re: Searching words with spaces for word without spaces in solr
Date Wed, 30 Jul 2014 14:38:16 GMT
This is the new configuration:

    <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer type="index">
>         <charFilter class="solr.HTMLStripCharFilterFactory"/>
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.ShingleFilterFactory" maxShingleSize="2"
> outputUnigrams="true" tokenSeparator=""/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/>
>           <filter class="solr.SynonymFilterFactory"
> synonyms="stemmed_synonyms_text_prime_index.txt" ignoreCase="true"
> expand="true"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords_text_prime_search.txt" enablePositionIncrements="true" />
>         <filter class="solr.ShingleFilterFactory" maxShingleSize="2"
> outputUnigrams="true" tokenSeparator=""/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
>         <filter class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/>
>       </fieldType>
>
>
These are current docs in my index:

<result name="response" numFound="3" start="0">
<doc>
<str name="id">2</str>
<str name="title">Icecream</str>
<long name="_version_">1475063961342705664</long>
</doc>
<doc>
<str name="id">3</str>
<str name="title">Ice-cream</str>
<long name="_version_">1475063961344802816</long>
</doc>
<doc>
<str name="id">1</str>
<str name="title">Ice Cream</str>
<long name="_version_">1475063961203245056</long>
</doc>
</result>
</response>

Query:
http://localhost:8983/solr/collection1/select?q=title:ice+cream&debug=true

Response:

<result name="response" numFound="2" start="0">
<doc>
<str name="id">1</str>
<str name="title">Ice Cream</str>
<long name="_version_">1475063961203245056</long>
</doc>
<doc>
<str name="id">3</str>
<str name="title">Ice-cream</str>
<long name="_version_">1475063961344802816</long>
</doc>
</result>
<lst name="debug">
<str name="rawquerystring">title:ice cream</str>
<str name="querystring">title:ice cream</str>
<str name="parsedquery">
(+(title:ice DisjunctionMaxQuery((title:cream))))/no_coord
</str>
<str name="parsedquery_toString">+(title:ice (title:cream))</str>
<lst name="explain">
<str name="1">
0.875 = (MATCH) sum of: 0.4375 = (MATCH) weight(title:ice in 0)
[DefaultSimilarity], result of: 0.4375 = score(doc=0,freq=2.0 =
termFreq=2.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 =
idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.61871845 = fieldWeight
in 0, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 =
termFreq=2.0 1.0 = idf(docFreq=2, maxDocs=3) 0.4375 = fieldNorm(doc=0)
0.4375 = (MATCH) weight(title:cream in 0) [DefaultSimilarity], result of:
0.4375 = score(doc=0,freq=2.0 = termFreq=2.0 ), product of: 0.70710677 =
queryWeight, product of: 1.0 = idf(docFreq=2, maxDocs=3) 0.70710677 =
queryNorm 0.61871845 = fieldWeight in 0, product of: 1.4142135 =
tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 1.0 = idf(docFreq=2,
maxDocs=3) 0.4375 = fieldNorm(doc=0)
</str>
<str name="3">
0.70710677 = (MATCH) sum of: 0.35355338 = (MATCH) weight(title:ice in 2)
[DefaultSimilarity], result of: 0.35355338 = score(doc=2,freq=1.0 =
termFreq=1.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 =
idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.5 = fieldWeight in 2,
product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.0 =
idf(docFreq=2, maxDocs=3) 0.5 = fieldNorm(doc=2) 0.35355338 = (MATCH)
weight(title:cream in 2) [DefaultSimilarity], result of: 0.35355338 =
score(doc=2,freq=1.0 = termFreq=1.0 ), product of: 0.70710677 =
queryWeight, product of: 1.0 = idf(docFreq=2, maxDocs=3) 0.70710677 =
queryNorm 0.5 = fieldWeight in 2, product of: 1.0 = tf(freq=1.0), with freq
of: 1.0 = termFreq=1.0 1.0 = idf(docFreq=2, maxDocs=3) 0.5 =
fieldNorm(doc=2)
</str>
</lst>

Still not working ????


On Fri, May 30, 2014 at 9:21 PM, Erick Erickson <erickerickson@gmail.com>
wrote:

> I'd spend some time with the admin/analysis page to understand the exact
> tokenization going on here. For instance, sequencing the
> shinglefilterfactory before worddelimiterfilterfactory may produce
> "interesting" resutls. And then throwing the Snowball factory at it and
> putting synonyms in front.... I suspect you're not indexing or searching
> what you think you are.
>
> Second, what happens when you query with &debug=query? That'll show you
> what the search string looks like.
>
> If that doesn't help, please post the results of looking at those things
> here, that'll provide some information for us to work with.
>
> Best,
> Erick
>
>
> On Fri, May 30, 2014 at 3:32 AM, sunshine glass <
> sunshineglassof2day@gmail.com> wrote:
>
> > Hi Folks,
> >
> > Any updates ??
> >
> >
> > On Wed, May 28, 2014 at 12:13 PM, sunshine glass <
> > sunshineglassof2day@gmail.com> wrote:
> >
> > > Dear Team,
> > >
> > > How can I handle compound word searches in solr ?.
> > > How can i search "hand bag" if I have "handbag" in my index. While
> using
> > > shingle in query analyzer, the query "ice cube" creates three tokens as
> > > "ice","cube", "icecube". Only ice and cubes are searched but not
> > > "icecubes".i.e not working for pair though I am using shingle filter.
> > >
> > > Here's the schema config.
> > >
> > >
> > >    1.  <fieldType name="text" class="solr.TextField"
> > >    positionIncrementGap="100">
> > >    2.       <analyzer type="index">
> > >    3.         <filter class="solr.SynonymFilterFactory"
> > >    synonyms="synonyms_text_prime_index.txt" ignoreCase="true"
> > expand="true"/>
> > >    4.         <charFilter class="solr.HTMLStripCharFilterFactory"/>
> > >    5.         <tokenizer class="solr.StandardTokenizerFactory"/>
> > >    6.          <filter class="solr.ShingleFilterFactory"
> > >    maxShingleSize="2" outputUnigrams="true" tokenSeparator=""/>
> > >    7.          <filter class="solr.WordDelimiterFilterFactory"
> > >    catenateWords="1" catenateNumbers="1" catenateAll="1"
> > preserveOriginal="1"
> > >    generateWordParts="1" generateNumberParts="1"/>
> > >    8.         <filter class="solr.LowerCaseFilterFactory"/>
> > >    9.         <filter class="solr.SnowballPorterFilterFactory"
> > >    language="English" protected="protwords.txt"/>
> > >    10.       </analyzer>
> > >    11.       <analyzer type="query">
> > >    12.         <tokenizer class="solr.StandardTokenizerFactory"/>
> > >    13.         <filter class="solr.SynonymFilterFactory"
> > >    synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> > >    14.         <filter class="solr.ShingleFilterFactory"
> > >    maxShingleSize="2" outputUnigrams="true" tokenSeparator=""/>
> > >    15.         <filter class="solr.WordDelimiterFilterFactory"
> > >    preserveOriginal="1"/>
> > >    16.         <filter class="solr.LowerCaseFilterFactory"/>
> > >    17.         <filter class="solr.SnowballPorterFilterFactory"
> > >    language="English" protected="protwords.txt"/>
> > >    18.       </analyzer>
> > >    19.     </fieldType>
> > >
> > >    Any help is appreciated.
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message