lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From roySolr <royrutten1...@gmail.com>
Subject Re: WhitespaceTokenizer and scoring(field length)
Date Wed, 27 Apr 2011 08:28:22 GMT
I thought it was something simple. Here is my configuration:

<fieldType name="searchType" class="solr.TextField"
positionIncrementGap="100">
   <analyzer>
	<charFilter class="solr.HTMLStripCharFilterFactory"/>
      	<tokenizer class="solr.WhitespaceTokenizerFactory"/>
      	<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="false"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.ASCIIFoldingFilterFactory"/>
   </analyzer>
</fieldType>

<field name="searchField" type="searchType" indexed="true" stored="true"
multiValued="true"/>

<copyField source="name" dest="searchField" maxChars="500"/>
<copyField source="storechain" dest="searchField" maxChars="500"/>
<copyField source="related_category" dest="searchField" maxChars="500"/>

I search for "supermarket":

<doc>
	<str name="companyid">357</str>
	<str name="name">LIDL Headoffice</str>
	<arr name="related_category">
		<str>Supermarkt</str>
	</arr>
	<str name="storechain">LIDL</str>
	<arr name="searchField">
		<str>LIDL</str>
		<str>LIDL Headoffice</str>
		<str>Supermarket</str>
	</arr>
</doc>

<doc>
	<str name="companyid">719</str>
	<str name="name">LIDL</str>
	<arr name="related_category">
		<str>Supermarket</str>
	</arr>
	<str name="storechain">LIDL</str>
	<arr name="searchField">
		<str>LIDL</str>
		<str>LIDL</str>
		<str>Supermarket</str>
	</arr>
</doc>



debugQuery:
Both documents has the same score, but doc 357 has more characters in the
searchField.

<lst name="explain">
	<str name="357">
		1.4330883 = (MATCH) fieldWeight(searchField:supermarket in 325), product
of: 1.0 = tf(termFreq(searchField:supermarket)=1) 2.8661766 =
idf(docFreq=3194, maxDocs=20651) 0.5 =                          
fieldNorm(field=searchField, doc=325)
	</str>
	
	<str name="719">
		1.4330883 = (MATCH) fieldWeight(searchField:supermarket in 678), product
of: 1.0 = tf(termFreq(searchField:supermarket)=1) 2.8661766 =
idf(docFreq=3194, maxDocs=20651) 0.5 =                          
fieldNorm(field=searchField, doc=678)
	</str>
</lst>

--
View this message in context: http://lucene.472066.n3.nabble.com/WhitespaceTokenizer-and-scoring-field-length-tp2865784p2869546.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message