lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reitzel, Charles" <Charles.Reit...@tiaa-cref.org>
Subject RE: Collations are not working fine.
Date Tue, 17 Feb 2015 22:23:26 GMT
Hi Nitin,

I was trying many different options for a couple different queries.   In fact, I have collations
working ok now with the Suggester and WFSTLookup.   The problem may have been due to a different
dictionary and/or lookup implementation and the specific options I was sending.

In general, we're using spellcheck for search suggestions.   The Suggester component (vs.
Suggester spellcheck implementation), doesn't handle all of our cases.  But we can get things
working using the spellcheck interface.  What gives us particular troubles are the cases where
a term may be valid by itself, but also be the start of longer words.

The specific terms are acronyms specific to our business.   But I'll attempt to show generic
examples.

E.g. a partial term like "fo" can expand to fox, fog, etc. and a full term like brown can
also expand to something like brownstone.   And, yes, the collation "brownstone fox" is nonsense.
 But assume, for the sake of argument, it appears in our documents somewhere.

For multiple term query with a spelling error (or partially typed term):  brown fo

We get collations in order of hits, descending like ...
"brown fox",
"brown fog",
"brownstone fox".

So far, so good.  

For a single term query, brown, we get a single suggestion, brownstone and no collations.

So, we don't know to keep the term brown!

At this point, we need spellcheck.extendedResults=true and look at the origFreq value in the
suggested corrections.  Unfortunately, the Suggester (spellcheck dictionary) does not populate
the original frequency information.  And, without this information, the SpellCheckComponent
cannot format the extended results.

However, with a simple change to Suggester.java, it was easy to get the needed frequency information
use it to make a sound decision to keep or drop the input term.   But I'd be much obliged
if there is a better way to go about it.

Configs below.

Thanks,
Charlie

<!-- SpellCheck component -->
  <searchComponent class="solr.SpellCheckComponent" name="suggestSC">
    <lst name="spellchecker">
      <str name="name">suggestDictionary</str>
      <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
      <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.WFSTLookupFactory</str>
      <str name="field">text_all</str>
      <float name="threshold">0.00000001</float>
      <str name="exactMatchFirst">true</str>
      <str name="buildOnCommit">true</str>
    </lst>
  </searchComponent>

<!-- Request Handler -->
<requestHandler name="/tcSuggest" class="solr.SearchHandler">
  <lst name="defaults">
    <str name="title">Search Suggestions (spellcheck)</str>
    <str name="echoParams">explicit</str>
    <str name="wt">json</str>
    <str name="rows">0</str>
    <str name="defType">edismax</str>
    <str name="df">text_all</str>
    <str name="fl">id,name,ticker,entityType,transactionType,accountType</str>
    <str name="spellcheck">true</str>
    <str name="spellcheck.count">5</str>
    <str name="spellcheck.dictionary">suggestDictionary</str>
    <str name="spellcheck.alternativeTermCount">5</str>
    <str name="spellcheck.collate">true</str>
    <str name="spellcheck.extendedResults">true</str>
    <str name="spellcheck.maxCollationTries">10</str>
    <str name="spellcheck.maxCollations">5</str>
  </lst>
  <arr name="last-components">
    <str>suggestSC</str>
  </arr>
</requestHandler>

-----Original Message-----
From: Nitin Solanki [mailto:nitinmlvya@gmail.com] 
Sent: Tuesday, February 17, 2015 3:17 AM
To: solr-user@lucene.apache.org
Subject: Re: Collations are not working fine.

Hi Charles,
                 Will you please send the configuration which you tried. It will help to solve
my problem. Have you sorted the collations on hits or frequencies of suggestions? If you did
than please assist me.

On Mon, Feb 16, 2015 at 7:59 PM, Reitzel, Charles < Charles.Reitzel@tiaa-cref.org> wrote:

> I have been working with collations the last couple days and I kept adding
> the collation-related parameters until it started working for me.   It
> seems I needed <str name="spellcheck.collateMaxCollectDocs">50</str>.
>
> But, I am using the Suggester with the WFSTLookupFactory.
>
> Also, I needed to patch the suggester to get frequency information in 
> the spellcheck response.
>
> -----Original Message-----
> From: Rajesh Hazari [mailto:rajeshhazari@gmail.com]
> Sent: Friday, February 13, 2015 3:48 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Collations are not working fine.
>
> Hi Nitin,
>
> Can u try with the below config, we have these config seems to be 
> working for us.
>
> <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
>
>      <str name="queryAnalyzerFieldType">text_general</str>
>
>
>   <lst name="spellchecker">
> <str name="name">wordbreak</str>
> <str name="classname">solr.WordBreakSolrSpellChecker</str>
> <str name="field">textSpell</str>
> <str name="combineWords">true</str>
> <str name="breakWords">false</str>
> <int name="maxChanges">5</int>
>   </lst>
>
>    <lst name="spellchecker">
> <str name="name">default</str>
> <str name="field">textSpell</str>
> <str name="classname">solr.IndexBasedSpellChecker</str>
> <str name="spellcheckIndexDir">./spellchecker</str>
> <str name="accuracy">0.75</str>
> <float name="thresholdTokenFrequency">0.01</float>
> <str name="buildOnCommit">true</str>
> <str name="spellcheck.maxResultsForSuggest">5</str>
>      </lst>
>
>
>   </searchComponent>
>
>
>
> <str name="spellcheck">true</str>
> <str name="spellcheck.dictionary">default</str>
> <str name="spellcheck.dictionary">wordbreak</str>
> <int name="spellcheck.count">5</int>
> <str name="spellcheck.alternativeTermCount">15</str>
> <str name="spellcheck.collate">true</str>
> <str name="spellcheck.onlyMorePopular">false</str>
> <str name="spellcheck.extendedResults">true</str>
> <str name ="spellcheck.maxCollations">100</str>
> <str name="spellcheck.collateParam.mm">100%</str>
> <str name="spellcheck.collateParam.q.op">AND</str>
> <str name="spellcheck.maxCollationTries">1000</str>
>
>
> *Rajesh.*
>
> On Fri, Feb 13, 2015 at 1:01 PM, Dyer, James 
> <James.Dyer@ingramcontent.com
> >
> wrote:
>
> > Nitin,
> >
> > Can you post the full spellcheck response when you query:
> >
> > q=gram_ci:"gone wthh thes wint"&wt=json&indent=true&shards.qt=/spell
> >
> > James Dyer
> > Ingram Content Group
> >
> >
> > -----Original Message-----
> > From: Nitin Solanki [mailto:nitinmlvya@gmail.com]
> > Sent: Friday, February 13, 2015 1:05 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Collations are not working fine.
> >
> > Hi James Dyer,
> >                           I did the same as you told me. Used 
> > WordBreakSolrSpellChecker instead of shingles. But still collations 
> > are not coming or working.
> > For instance, I tried to get collation of "gone with the wind" by 
> > searching "gone wthh thes wint" on field=gram_ci but didn't succeed.
> > Even, I am getting the suggestions of wtth as *with*, thes as *the*,
> wint as *wind*.
> > Also I have documents which contains "gone with the wind" having 167 
> > times in the documents. I don't know that I am missing something or not.
> > Please check my below solr configuration:
> >
> > *URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:"gone wthh thes 
> > wint"&wt=json&indent=true&shards.qt=/spell
> >
> > *solrconfig.xml:*
> >
> > <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
> >     <str name="queryAnalyzerFieldType">textSpellCi</str>
> >     <lst name="spellchecker">
> >       <str name="name">default</str>
> >       <str name="field">gram_ci</str>
> >       <str name="classname">solr.DirectSolrSpellChecker</str>
> >       <str name="distanceMeasure">internal</str>
> >       <float name="accuracy">0.5</float>
> >       <int name="maxEdits">2</int>
> >       <int name="minPrefix">0</int>
> >       <int name="maxInspections">5</int>
> >       <int name="minQueryLength">2</int>
> >       <float name="maxQueryFrequency">0.9</float>
> >       <str name="comparatorClass">freq</str>
> >     </lst>
> > <lst name="spellchecker">
> >       <str name="name">wordbreak</str>
> >       <str name="classname">solr.WordBreakSolrSpellChecker</str>
> >       <str name="field">gram</str>
> >       <str name="combineWords">true</str>
> >       <str name="breakWords">true</str>
> >       <int name="maxChanges">5</int>
> >     </lst>
> > </searchComponent>
> >
> > <requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
> >     <lst name="defaults">
> >       <str name="df">gram_ci</str>
> >       <str name="spellcheck.dictionary">default</str>
> >       <str name="spellcheck">on</str>
> >       <str name="spellcheck.extendedResults">true</str>
> >       <str name="spellcheck.count">25</str>
> >       <str name="spellcheck.onlyMorePopular">true</str>
> >       <str name="spellcheck.maxResultsForSuggest">100000000</str>
> >       <str name="spellcheck.alternativeTermCount">25</str>
> >       <str name="spellcheck.collate">true</str>
> >       <str name="spellcheck.maxCollations">50</str>
> >       <str name="spellcheck.maxCollationTries">50</str>
> >       <str name="spellcheck.collateExtendedResults">true</str>
> >     </lst>
> >     <arr name="last-components">
> >       <str>spellcheck</str>
> >     </arr>
> >   </requestHandler>
> >
> > *Schema.xml: *
> >
> > <field name="gram_ci" type="textSpellCi" indexed="true" stored="true"
> > multiValued="false"/>
> >
> > </fieldType><fieldType name="textSpellCi" class="solr.TextField"
> > positionIncrementGap="100">
> >        <analyzer type="index">
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> > </analyzer>
> >     <analyzer type="query">
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> > </analyzer>
> > </fieldType>
> >
>
> **********************************************************************
> *** This e-mail may contain confidential or privileged information.
> If you are not the intended recipient, please notify the sender 
> immediately and then delete it.
>
> TIAA-CREF
> **********************************************************************
> ***
>

*************************************************************************
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and then delete
it.

TIAA-CREF
*************************************************************************
Mime
View raw message