lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dyer, James" <James.D...@ingramcontent.com>
Subject RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.
Date Mon, 22 Apr 2013 21:25:03 GMT
This doesn't make a lot of sense to me as in both cases the very first collation it tries is
the one it is returning.  So you're getting a very optimized spellcheck in both cases.  But
it does have to issue both queries 2 times:  the first time, it tries the user's main query
anding there are not enough hits, it then tries the collation query to see how many hits that
will return.  Could it be that these two queries just are less/more expensive and that difference
gets magnified by running each twice?

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: SandeepM [mailto:skmirch@hotmail.com] 
Sent: Monday, April 22, 2013 4:04 PM
To: solr-user@lucene.apache.org
Subject: RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.

Chocolat Factry


<?xml version="1.0" encoding="UTF-8"?>
<response>

<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">77</int>
</lst>
<result name="response" numFound="0" start="0">
</result>
<lst name="spellcheck">
  <lst name="suggestions">
    <lst name="chocolat">
      <int name="numFound">1</int>
      <int name="startOffset">0</int>
      <int name="endOffset">8</int>
      <int name="origFreq">615</int>
      <arr name="suggestion">
        <lst>
          <str name="word">chocolate</str>
          <int name="freq">6544</int>
        </lst>
      </arr>
    </lst>
    <lst name="factry">
      <int name="numFound">5</int>
      <int name="startOffset">9</int>
      <int name="endOffset">15</int>
      <int name="origFreq">6</int>
      <arr name="suggestion">
        <lst>
          <str name="word">factory</str>
          <int name="freq">23614</int>
        </lst>
        <lst>
          <str name="word">factor</str>
          <int name="freq">5128</int>
        </lst>
        <lst>
          <str name="word">factus</str>
          <int name="freq">290</int>
        </lst>
        <lst>
          <str name="word">factum</str>
          <int name="freq">178</int>
        </lst>
        <lst>
          <str name="word">factae</str>
          <int name="freq">102</int>
        </lst>
      </arr>
    </lst>
    <bool name="correctlySpelled">false</bool>
    <lst name="collation">
      <str name="collationQuery">chocolate factory</str>
      <int name="hits">85</int>
      <lst name="misspellingsAndCorrections">
        <str name="chocolat">chocolate</str>
        <str name="factry">factory</str>
      </lst>
    </lst>
  </lst>
</lst>
</response>




Pursut Hapyness
<?xml version="1.0" encoding="UTF-8"?>
<response>

<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">16</int>
</lst>
<result name="response" numFound="0" start="0">
</result>
<lst name="spellcheck">
  <lst name="suggestions">
    <lst name="pursut">
      <int name="numFound">5</int>
      <int name="startOffset">0</int>
      <int name="endOffset">6</int>
      <int name="origFreq">0</int>
      <arr name="suggestion">
        <lst>
          <str name="word">pursuit</str>
          <int name="freq">1209</int>
        </lst>
        <lst>
          <str name="word">pursue</str>
          <int name="freq">108</int>
        </lst>
        <lst>
          <str name="word">pursit</str>
          <int name="freq">1</int>
        </lst>
        <lst>
          <str name="word">perdut</str>
          <int name="freq">94</int>
        </lst>
        <lst>
          <str name="word">purdue</str>
          <int name="freq">70</int>
        </lst>
      </arr>
    </lst>
    <lst name="hapyness">
      <int name="numFound">5</int>
      <int name="startOffset">7</int>
      <int name="endOffset">15</int>
      <int name="origFreq">0</int>
      <arr name="suggestion">
        <lst>
          <str name="word">happyness</str>
          <int name="freq">175</int>
        </lst>
        <lst>
          <str name="word">hapiness</str>
          <int name="freq">62</int>
        </lst>
        <lst>
          <str name="word">hayness</str>
          <int name="freq">1</int>
        </lst>
        <lst>
          <str name="word">happiness</str>
          <int name="freq">7788</int>
        </lst>
        <lst>
          <str name="word">harkness</str>
          <int name="freq">324</int>
        </lst>
      </arr>
    </lst>
    <bool name="correctlySpelled">false</bool>
    <lst name="collation">
      <str name="collationQuery">pursuit happyness</str>
      <int name="hits">10</int>
      <lst name="misspellingsAndCorrections">
        <str name="pursut">pursuit</str>
        <str name="hapyness">happyness</str>
      </lst>
    </lst>
  </lst>
</lst>
</response>

Spellcheck is used separately and we are not using any q along with
spellcheck.

Our search query also queries other fields, not just spellcheck and
therefore does not give a good representation of Qtime.   We use groupings
in the search query.
For Chocolate Factory, I get a search QTime of 198ms
For Pursuit Happyness, I get a search QTime of 318ms

Would appreciate your insights.
Thanks.
-- Sandeep




--
View this message in context: http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tp4057176p4058086.html
Sent from the Solr - User mailing list archive at Nabble.com.



Mime
View raw message