lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: Relevancy Issue - How do I make it work?
Date Fri, 30 May 2008 02:14:56 GMT
On Thu, May 29, 2008 at 9:44 PM, Tim Christensen <tim@vanns.com> wrote:
> Yonik,
>
> Thank you for the response. You are correct, regular (non-accessory)
> products are boosted '2.0' at index time. However both items the non ipod
> item and the ipod would have received the initial boost on the same fields
> since they are both non-accessory items.
>
> Is your comment still relevant in that context?

Yes.
There's a bug somewhere that ended up boosting that document or field
much more than normal.

First thing is to determine if it's in your indexing code, or in Solr.
Is there a way for you to verify the exact data you sent to Solr for
that document (the exact XML, if that is what you are sending?)

-Yonik


> Tim
>
> On May 29, 2008, at 7:30 PM, Yonik Seeley wrote:
>
>> field norms of un-boosted fields are normally less than 1 (it's a
>> factor that weights larger fields less).
>> The index-time boost is also multiplied into this factor though.
>> Given that your first doc had a huge norm, it looks like the document
>> or field was boosted at index time?
>>
>> -Yonik
>>
>> On Thu, May 29, 2008 at 9:22 PM, Tim Christensen <tim@vanns.com> wrote:
>>>
>>> Hi,
>>>
>>> This is my first post. I have been working with Lucene for about 4 weeks
>>> and
>>> Solr for just about 10 days. We are going to convert our site search over
>>> to
>>> Solr as soon as we figure out some of the nuances.
>>>
>>> As I was testing out the synonyms features to decide how we could best
>>> use
>>> it, I searched for iPod (I know it is an example, but we actually sell
>>> them). I was shocked when the search results were nothing close to an
>>> iPod.
>>>
>>> Looking closer, I could see that the description had an iPod word in it,
>>> just 1. With debug on, that fact is confirmed (this is the first result):
>>> <str name="id=502999430,internal_docid=6247">
>>> 152529.23 = (MATCH) fieldWeight(search_text:ipod in 6247), product of:
>>> 1.0 = tf(termFreq(search_text:ipod)=1)
>>> 3.7238584 = idf(docFreq=522)
>>> 40960.0 = fieldNorm(field=search_text, doc=6247)
>>> </str>
>>> Here is an explainOther, FOR an actual iPod SKU (in the same search):
>>> <str name="otherQuery">id:650085488</str>
>>> <lst name="explainOther">
>>> <str name="id=650085488,internal_docid=6985">
>>> 1.0473351 = (MATCH) fieldWeight(search_text:ipod in 6985), product of:
>>> 3.0 = tf(termFreq(search_text:ipod)=9)
>>> 3.7238584 = idf(docFreq=522)
>>> 0.09375 = fieldNorm(field=search_text, doc=6985)
>>> </str>
>>> If the term frequency is higher, the only difference is'fieldNorm' which
>>> I
>>> do not understand in the context of relevancy. Does this have to do with
>>> omitNorms in some way?
>>> In a related factor, I also tried the dismax query with the following
>>> line
>>> in it:
>>> <str name="qf">search_text^0.5 brand^10.0 keywords^5.0 title^20.0
>>> sub_title^1.5 model^2.0 attribute^1.1</str>
>>> As an experiment I boosted the title a bunch, since this is where the
>>> term
>>> iPod exists the most. It made no effect, in fact, it was not even
>>> working.
>>> The title was not being used at all, just the search_text, even though I
>>> have it indexed.
>>> Here is the relevant schema parts
>>>  <field name="id" type="string" indexed="true" stored="true"
>>> required="true" />
>>>  <field name="brand" type="string" indexed="true" stored="true" />
>>>  <field name="model" type="string" indexed="true" stored="true" />
>>>  <field name="manufacturer_model" type="string" indexed="true"
>>> stored="true" />
>>>  <field name="keywords" type="string" indexed="true" stored="false" />
>>>  <field name="title" type="string" indexed="true" stored="true" />
>>>  <field name="sub_title" type="string" indexed="true" stored="true" />
>>>  <field name="attribute" type="string" indexed="true" stored="true"
>>> multiValued="true" />
>>>  <field name="type" type="string" indexed="true" stored="true" />
>>>  <field name="description_category" type="string" indexed="true"
>>> stored="true" />
>>>  <field name="description" type="string" indexed="true" stored="true" />
>>>  <field name="brand_id" type="string" indexed="false" stored="true" />
>>>  <field name="code" type="string" indexed="false" stored="true" />
>>>  <field name="color" type="string" indexed="true" stored="true" />
>>>  <field name="description_category_id" type="string" indexed="false"
>>> stored="true" />
>>>  <field name="display_price" type="sfloat" indexed="false" stored="true"
>>> />
>>>  <field name="line_item_price" type="sfloat" indexed="true" stored="true"
>>> />
>>>  <field name="main_category" type="string" indexed="true" stored="true"
>>> />
>>>  <field name="main_category_id" type="string" indexed="false"
>>> stored="true"
>>> />
>>>  <field name="regular_price" type="sfloat" indexed="false" stored="true"
>>> />
>>>  <field name="sku" type="string" indexed="true" stored="true" />
>>>  <field name="type_id" type="string" indexed="false" stored="true" />
>>>  <field name="upc" type="string" indexed="true" stored="true" />
>>>  <field name="size" type="string" indexed="true" stored="true" />
>>>  <field name="search_text" type="text" indexed="true" stored="false"
>>> multiValued="true" termVectors="true"/>
>>>
>>> <defaultSearchField>search_text</defaultSearchField>
>>>
>>>  <copyField source="brand" dest="search_text"/>
>>>  <copyField source="model" dest="search_text"/>
>>>  <copyField source="manufacturer_model" dest="search_text"/>
>>>  <copyField source="keywords" dest="search_text"/>
>>>  <copyField source="title" dest="search_text"/>
>>>  <copyField source="sub_title" dest="search_text"/>
>>>  <copyField source="attribute" dest="search_text"/>
>>>  <copyField source="description_category" dest="search_text"/>
>>>  <copyField source="type" dest="search_text"/>
>>>  <copyField source="description" dest="search_text"/>
>>>  <copyField source="main_category" dest="search_text"/>
>>>  <copyField source="sku" dest="search_text"/>
>>>  <copyField source="upc" dest="search_text"/>
>>> Thanks to all who are willing to take a look at this and help.
>>>
>>> ----------------------------------------------------
>>> Tim Christensen
>>> Director Media & Technology
>>> Vann's Inc.
>>> 406-203-4656
>>>
>>> tim@vanns.com
>>>
>>> http://www.vanns.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
>
> ----------------------------------------------------
> Tim Christensen
> Director Media & Technology
> Vann's Inc.
> 406-203-4656
>
> tim@vanns.com
>
> http://www.vanns.com
>
>
>
>
>
>
>
>

Mime
View raw message