lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravish Bhagdev <ravish.bhag...@gmail.com>
Subject Re: No Effect of omitNorms and omitTermFreqAndPositions when using MLT handler?
Date Mon, 21 May 2012 10:03:54 GMT
Ahh, this is because I have to override DefaultSimilarity to turn off
tf/idf scoring?  But this will apply to all the fields and general search
on text fields as well?  Is there a way to apply custom similarity to
specific field types or fields only?  Is there no way of turning TF/IDF off
without this?

Thanks,
Ravish

On Mon, May 21, 2012 at 10:24 AM, Ravish Bhagdev
<ravish.bhagdev@gmail.com>wrote:

> Hi All,
>
> I was wondering if omitNorms will have any effect on MLT handler at all?
>
> I'm using schema version 1.2 with Solr 1.4 and have defined couple of
> fields, which I want to use for MLT lookup and don't want factors like
> field length or TF/IDF to affect the scores.  The definitions are as below:
>
>      <fieldType name="lowercase" class="solr.TextField"
> positionIncrementGap="100" omitNorms="true" omitTermFreqAndPositions="true">
>  <analyzer>
> <tokenizer class="solr.KeywordTokenizerFactory"/>
>  <filter class="solr.LowerCaseFilterFactory" />
> </analyzer>
>  </fieldType>
>
> <fieldType name="text_nonorms" class="solr.TextField"
> positionIncrementGap="100" omitNorms="true" omitTermFreqAndPositions="true">
>  <analyzer type="index">
> <tokenizer class="solr.WhitespaceTokenizerFactory" />
>  <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>  <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" />
>  <filter class="solr.LowerCaseFilterFactory" />
> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"
> />
>  <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
> </analyzer>
>  <analyzer type="query">
> <tokenizer class="solr.WhitespaceTokenizerFactory" />
>  <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true" />
>  <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>  <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" />
>  <filter class="solr.LowerCaseFilterFactory" />
> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"
> />
>  <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
> </analyzer>
>  </fieldType>
>
> <!-- and the fields that use the above field types are -->
>  <field name="PROFILE_TAGS" type="lowercase" indexed="true" stored="true"
> multiValued="true" termVectors="true"/>
>  <field name="PROFILE_TAGS_TXT" type="text_nonorms" indexed="true"
> stored="true" multiValued="true" termVectors="true"/>
>
> In My solrconfig.xml I have defined following for my MLT request handler:
>
>   <requestHandler name="/mlt" class="solr.MoreLikeThisHandler">
>  <lst name="defaults">
> <str name="mlt.fl">PROFILE_TAGS,PROFILE_TAGS_TXT</str>
>  <str name="mlt.qf">PROFILE_TAGS^10.0 PROFILE_TAGS_TXT^2.0</str>
> <int name="mlt.mindf">1</int>
>  <int name="mlt.mintf">1</int>
> <str name="fl">id,score</str>
>  <str name="mlt.fl">PROFILE_TAGS,PROFILE_TAGS_TXT</str>
> </lst>
>   </requestHandler>
>
>
> However, when I run my query as follows:
>
> http://localhost:9090/solr/mlt?fl=*,score&start=0&q=id:4417454.matchRecord&qt=/mlt&fq=targetDB:ConnectMeDB&rows=1000&&debugQuery=on
>
> the debug scoring info shows following:
>
> <str name="5042172.matchRecord">
> 0.17156276 = (MATCH) product of:
>   1.4296896 = (MATCH) sum of:
>     0.24737607 = (MATCH) weight(PROFILE_TAGS_TXT:system^5.0 in 1472),
> product of:
>       0.06376338 = queryWeight(PROFILE_TAGS_TXT:system^5.0), product of:
>         5.0 = boost
>         3.8795946 = idf(docFreq=538, maxDocs=9598)
>         0.0032871156 = queryNorm
>       3.8795946 = (MATCH) fieldWeight(PROFILE_TAGS_TXT:system in 1472),
> product of:
>         1.0 = tf(termFreq(PROFILE_TAGS_TXT:system)=1)
>         3.8795946 = idf(docFreq=538, maxDocs=9598)
>         1.0 = fieldNorm(field=PROFILE_TAGS_TXT, doc=1472)
>     0.65193653 = (MATCH) weight(PROFILE_TAGS_TXT:adapt^5.0 in 1472),
> product of:
>       0.10351306 = queryWeight(PROFILE_TAGS_TXT:adapt^5.0), product of:
>         5.0 = boost
>         6.298109 = idf(docFreq=47, maxDocs=9598)
>         0.0032871156 = queryNorm
>       6.298109 = (MATCH) fieldWeight(PROFILE_TAGS_TXT:adapt in 1472),
> product of:
>         1.0 = tf(termFreq(PROFILE_TAGS_TXT:adapt)=1)
>         6.298109 = idf(docFreq=47, maxDocs=9598)
>         1.0 = fieldNorm(field=PROFILE_TAGS_TXT, doc=1472)
>     0.530377 = (MATCH) weight(PROFILE_TAGS_TXT:optic^5.0 in 1472), product
> of:
>       0.093365155 = queryWeight(PROFILE_TAGS_TXT:optic^5.0), product of:
>         5.0 = boost
>         5.6806736 = idf(docFreq=88, maxDocs=9598)
>         0.0032871156 = queryNorm
>       5.6806736 = (MATCH) fieldWeight(PROFILE_TAGS_TXT:optic in 1472),
> product of:
>         1.0 = tf(termFreq(PROFILE_TAGS_TXT:optic)=1)
>         5.6806736 = idf(docFreq=88, maxDocs=9598)
>         1.0 = fieldNorm(field=PROFILE_TAGS_TXT, doc=1472)
>   0.12 = coord(3/25)
> </str>
>
> Which seems to suggest that the TF/IDF is being performed on these fields!
>  Also, does it make any difference if I specify omitNorms in <field>
> definition vs specifying in <fieldType> definition?
>
> I will appreciate any help with this.
>
> Thanks,
> Ravish
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message