lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Zhang <smartag...@gmail.com>
Subject Re: Question about field boost
Date Tue, 23 Jul 2013 18:34:12 GMT
I'm not sure I understand, Erick. I don't have a "text" field in my schema;
"title" and "content" are both legal fields.


On Tue, Jul 23, 2013 at 5:15 AM, Erick Erickson <erickerickson@gmail.com>wrote:

> this isn't doing what you think.
> title^10 content
> is actually parsed as
>
> text:title^100 text:content
>
> where "text" is my default search field.
>
> assuming title is a field. If you look a little
> farther up the debug output you'll see that.
>
> You probably want
> title:content^100 or some such?
>
> Erick
>
> On Tue, Jul 23, 2013 at 1:43 AM, Jack Krupansky <jack@basetechnology.com>
> wrote:
> > That means that for that document "china" occurs in the title vs.
> "snowden"
> > found in a document but not in the title.
> >
> >
> > -- Jack Krupansky
> >
> > -----Original Message----- From: Joe Zhang
> > Sent: Tuesday, July 23, 2013 12:52 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Question about field boost
> >
> >
> > Is my reading correct that the boost is only applied on "china" but not
> > "snowden"? How can that be?
> >
> > My query is: q=china+snowden&qf=title^10 content
> >
> >
> > On Mon, Jul 22, 2013 at 9:43 PM, Joe Zhang <smartagent@gmail.com> wrote:
> >
> >> Thanks for your hint, Jack. Here is the debug results, which I'm having
> a
> >> hard deciphering (the two terms are "china" and "snowden")...
> >>
> >> 0.26839527 = (MATCH) sum of:
> >>   0.26839527 = (MATCH) sum of:
> >>     0.26757246 = (MATCH) max of:
> >>       7.9147343E-4 = (MATCH) weight(content:china in 249), product of:
> >>         0.019873314 = queryWeight(content:china), product of:
> >>           1.6649085 = idf(docFreq=46832, maxDocs=91058)
> >>           0.01193658 = queryNorm
> >>         0.039825942 = (MATCH) fieldWeight(content:china in 249), product
> >> of:
> >>           4.8989797 = tf(termFreq(content:china)=24)
> >>           1.6649085 = idf(docFreq=46832, maxDocs=91058)
> >>           0.0048828125 = fieldNorm(field=content, doc=249)
> >>       0.26757246 = (MATCH) weight(title:china^10.0 in 249), product of:
> >>         0.5836803 = queryWeight(title:china^10.0), product of:
> >>           10.0 = boost
> >>           4.8898454 = idf(docFreq=1861, maxDocs=91058)
> >>           0.01193658 = queryNorm
> >>         0.45842302 = (MATCH) fieldWeight(title:china in 249), product
> of:
> >>           1.0 = tf(termFreq(title:china)=1)
> >>           4.8898454 = idf(docFreq=1861, maxDocs=91058)
> >>           0.09375 = fieldNorm(field=title, doc=249)
> >>     8.2282536E-4 = (MATCH) max of:
> >>       8.2282536E-4 = (MATCH) weight(content:snowden in 249), product of:
> >>         0.03407834 = queryWeight(content:snowden), product of:
> >>           2.8549502 = idf(docFreq=14246, maxDocs=91058)
> >>           0.01193658 = queryNorm
> >>         0.024145111 = (MATCH) fieldWeight(content:snowden in 249),
> product
> >> of:
> >>           1.7320508 = tf(termFreq(content:snowden)=3)
> >>           2.8549502 = idf(docFreq=14246, maxDocs=91058)
> >>           0.0048828125 = fieldNorm(field=content, doc=249)
> >>
> >>
> >> On Mon, Jul 22, 2013 at 9:27 PM, Jack Krupansky
> >> <jack@basetechnology.com>wrote:
> >>
> >>> Maybe you're not doing anything wrong - other than having an artificial
> >>> expectation of what the true relevance of your data actually is. Many
> >>> factors go into relevance scoring. You need to look at all aspects of
> >>> your
> >>> data.
> >>>
> >>> Maybe your terms don't occur in your titles the way you think they do.
> >>>
> >>> Maybe you need a boost of 500 or more...
> >>>
> >>> Lots of potential maybes.
> >>>
> >>> Relevancy tuning is an art and craft, hardly a science.
> >>>
> >>> Step one: Know your data, inside and out.
> >>>
> >>> Use the debugQuery=true parameter on your queries and see how much of
> the
> >>> score is dominated by your query terms in the non-title fields.
> >>>
> >>> -- Jack Krupansky
> >>>
> >>> -----Original Message----- From: Joe Zhang
> >>> Sent: Monday, July 22, 2013 11:06 PM
> >>> To: solr-user@lucene.apache.org
> >>> Subject: Question about field boost
> >>>
> >>>
> >>> Dear Solr experts:
> >>>
> >>> Here is my query:
> >>>
> >>> defType=dismax&q=term1+term2&**qf=title^100 content
> >>>
> >>> Apparently (at least I thought) my intention is to boost the title
> field.
> >>> While I'm getting some non-trivial results, I'm surprised that the
> >>> documents with both term1 and term2 in title (I know such docs do exist
> >>> in
> >>> my repository) were not returned (or maybe ranked very low). The
> >>> situation
> >>> does not change even when I use much larger boost factors.
> >>>
> >>> What am I doing wrong?
> >>>
> >>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message