lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dyer, James" <James.D...@ingramcontent.com>
Subject RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.
Date Fri, 19 Apr 2013 14:33:46 GMT
I guess the first thing I'd do is to set "maxCollationTries" to zero.  This means it will only
run your main query once and not re-run it to check the collations. Now see if your queries
have consistent qtime.  One easy explanation is that with "maxCollationTries=10", it may be
running your query up to 11 times to check up to 10 possible collations.  If the query takes
50ms by itself, then you've got 550ms total to not find spelling corrections.  Unfortunately,
the worst case here is the one that gives the user nothing back.  

Another thing to look at, with "maxCollationTries" at zero, set "maxCollations" to 10.  This
will give you a list of the 10 collations it would have tried.  You can figure if the one
that gets hits is far enough down the list to explain the high total qtime when "maxCollationTries=10".
 If this explains it, then the obvious solution is to set "maxCollationTries" to something
lower than 10.  (you'll need tio weigh how long you're willing to make your users wait to
possibly get spelling suggestions)  Or possibly, use "spellcheck.q" to give it an easier query
to evalutate than the main query (but that can still give valid collations). Also, see https://issues.apache.org/jira/browse/SOLR-3240
which is an optimization for this feature.

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: SandeepM [mailto:skmirch@hotmail.com] 
Sent: Thursday, April 18, 2013 11:33 PM
To: solr-user@lucene.apache.org
Subject: DirectSolrSpellChecker : vastly varying spellcheck QTime times.

Hi!

I am using SOLR 4.2.1.

My solrconfig.xml contains the following:

  <searchComponent name="MySpellcheck" class="solr.SpellCheckComponent">
       <str name="queryAnalyzerFieldType">text_spell</str>

     <lst name="spellchecker">
       <str name="name">MySpellchecker</str>
       <str name="field">spell</str>
       <str name="classname">solr.DirectSolrSpellChecker</str>
       <str name="distanceMeasure">internal</str>
       <float name="accuracy">0.5</float>
       <int name="maxEdits">2</int>
       <int name="minPrefix">1</int>
       <int name="maxInspections">5</int>
       <int name="minQueryLength">3</int>
       <float name="maxQueryFrequency">0.01</float>
       
     </lst>
 </searchComponent>

<requestHandler name="/select" class="solr.SearchHandler" startup="lazy">
    <lst name="defaults">
      <int name="rows">10</int>
      <str name="df">id</str>
      <str name="spellcheck.dictionary">MySpellchecker</str>
      <str name="spellcheck">on</str>
      <str name="spellcheck.extendedResults">false</str>
      <str name="spellcheck.count">10</str>
      <str name="spellcheck.alternativeTermCount">10</str>
      <str name="spellcheck.maxResultsForSuggest">35</str>
      <str name="spellcheck.onlyMorePopular">true</str>
      <str name="spellcheck.collate">true</str>
      <str name="spellcheck.collateExtendedResults">false</str>
      <str name="spellcheck.maxCollationTries">10</str>
      <str name="spellcheck.maxCollations">1</str>
      <str name="spellcheck.collateParam.q.op">AND</str>
    </lst>
    <arr name="last-components">
      <str>MySpellcheck</str>
    </arr>
  </requestHandler>

schema.xml with the spell field looks like:

                <fieldType name="text_spell" class="solr.TextField"
positionIncrementGap="100"  sortMissingLast="true" >
                        <analyzer type="index">
                                <tokenizer
class="solr.StandardTokenizerFactory" />
                                <filter class="solr.LowerCaseFilterFactory"
/>
                                <filter class="solr.StopFilterFactory"
ignoreCase="true"
                                         words="lang/stopwords_en.txt"
enablePositionIncrements="true" />
                        </analyzer>
                        <analyzer type="query">
                                <tokenizer
class="solr.StandardTokenizerFactory" />
                                <filter class="solr.LowerCaseFilterFactory"
/>
                                <filter class="solr.StopFilterFactory"
ignoreCase="true"
                                         words="lang/stopwords_en.txt"
enablePositionIncrements="true" />
                        </analyzer>
                </fieldType>

                <field name="spell" type="text_spell" indexed="true"
stored="false" multiValued="true" />

        <copyField source="title" dest="spell" />
        <copyField source="artist" dest="spell" />
 
My query:
http://host/solr/select?q=&spellcheck.q=chocolat%20factry&spellcheck=true&df=spell&fl=&indent=on&wt=xml&rows=10&version=2.2&echoParams=explicit

In this case, the intent is to correct "chocolat factry" with "chocolate
factory" which exists in my spell field index. I see a QTime from the above
query as somewhere between 350-400ms

I run a similar query replacing the spellcheck terms to "pursut hapyness"
whereas "pursuit happyness" actually exists in my spell field and I see
QTime of 15-17ms .

Both query produce collations correctly but there is order of magnitude
difference in QTime.  There is one edit per term in both cases or 2 edits in
each query. The length of words in both these queries seem identical. I'd
like to understand why there is this vast difference in QTime.  I would
appreciate any help with this since I am not sure how I can get any
meaningful performance numbers and attribute the slowness to anything in
particular. 

I also see a vast difference in QTime in another case.  Replace the search
terms in the above query with "over cuckoo's nest", "over cuccoo's nst",
etc.   "over cuckoo's nest" exists in my indexed spell field and so it
should find it almost immediately.  This query fails to produce any
collation and takes 10seconds. While the second query "over cuccoo's nst"
corrects the phrase and also returns in 24ms. Something does not sound right
here.

I would appreciate help with these.

Thanks in advance.
Regards,
-- Sandeep



--
View this message in context: http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tp4057176.html
Sent from the Solr - User mailing list archive at Nabble.com.



Mime
View raw message