lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Pickler <andy.pick...@gmail.com>
Subject MoreLikeThis - No Results
Date Wed, 22 May 2013 18:20:29 GMT
I'm a developing a recommendation feature in our app using the
MoreLikeThisHandler <http://wiki.apache.org/solr/MoreLikeThisHandler>, and
so far it is doing a great job.  We're using a user's "competency keywords"
as the MLT field list and the user's corresponding document in Solr as the
"comparison document".  I have found that for one user I'm not receiving
any recommendations, and I'm not sure why.

Solr: 4.1.0

*relevant schema*:

<field name="competencyKeywords" type="short-mlt-text" indexed="true"
stored="true" multiValued="true" termVectors="true"/>

    <fieldType name="short-mlt-text" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>

*user's values*:

<arr name="competencyKeywords">
<str>Healthcare Cost Trends</str>
</arr>

Is it possible that among all the ~40,000 users in this index (about 500 of
which have the same competency keywords), that the words "healthcare",
"cost" and "trends" are just judged by Lucene to not be "significant".  I
realize that I may not understand how the MLT Handler is doing things under
the covers...I've only been guessing until now based on the (otherwise
excellent) results I've been seeing.

Thanks,
Andy Pickler

P.S.  For some additional information, the following query:

/mlt?q=objectId:user91813&mlt.fl=competencyKeywords&mlt.interestingTerms=details&debugQuery=true&mlt.match.include=false

...produces the following results...

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">2</int>
</lst>
<result name="response" numFound="0" start="0"/>
<lst name="interestingTerms"/>
<lst name="debug">
<str name="rawquerystring">objectId:user91813</str>
<str name="querystring">objectId:user91813</str>
<str name="parsedquery"/>
<str name="parsedquery_toString"/>
<lst name="explain"/>
</lst>
</response>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message