Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 20848 invoked from network); 17 May 2006 18:08:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 17 May 2006 18:08:55 -0000 Received: (qmail 14649 invoked by uid 500); 17 May 2006 18:08:47 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 14616 invoked by uid 500); 17 May 2006 18:08:47 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 14605 invoked by uid 99); 17 May 2006 18:08:46 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 May 2006 11:08:46 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of suman.ghosh.1@gmail.com designates 66.249.92.174 as permitted sender) Received: from [66.249.92.174] (HELO ug-out-1314.google.com) (66.249.92.174) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 May 2006 11:08:45 -0700 Received: by ug-out-1314.google.com with SMTP id u2so268306uge for ; Wed, 17 May 2006 11:08:24 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type; b=rPADWSJfs6UiueIRxDIyJwXc+nzWf/mj4w2evrLX9kdITeMZCM24SHAtno/G2jPVEz3xYsFAPD18fFG6M9RJbr2nCnwcwiqwzMk8QDyve+4T/szEcfDJIvqlKcME6W5cOb5jSEIFbgQjRNCvRyjNpNeMUFWuYkw064gvUJXItU0= Received: by 10.78.58.11 with SMTP id g11mr260468hua; Wed, 17 May 2006 11:08:23 -0700 (PDT) Received: by 10.78.21.2 with HTTP; Wed, 17 May 2006 11:08:23 -0700 (PDT) Message-ID: Date: Wed, 17 May 2006 14:08:23 -0400 From: "Suman Ghosh" To: java-user Subject: Boost factor in MultiFieldQueryParser MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_18160_6598430.1147889303618" X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N ------=_Part_18160_6598430.1147889303618 Content-Type: multipart/alternative; boundary="----=_Part_18161_2423705.1147889303618" ------=_Part_18161_2423705.1147889303618 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Hi all, I am evaluating Lucene 1.9 for a search application. I am using MultiFieldQueryParser for searching across fields and everything works fine= . However, we have a new requirement where certain fields need to be boosted while searching. To complicate matters, users can specify fields while searching (e.g. "title:hybrid content:vehicle"). I came across this enhancement request ( http://issues.apache.org/bugzilla/show_bug.cgi?id=3D32115) that appears to address the boosting issue (Please see attached sample code to illustrate the problem. NewMultiFieldQueryParser is the enhanced version of MultiFieldQueryParser as per the enhancement request. I used lucene-core-1.9.1.jar to test the code). For all simple searches (i.e. when I don't mention a field while searching), it appears to work - explain output follows: Hits for "hybrid vehicle" were found in quotes by: 1. Big tax savings on hybrid vehicles 0.36948532 =3D sum of: 0.35416904 =3D sum of: 0.33885276 =3D weight(title:hybrid^8.0 in 0), product of: 0.5510778 =3D queryWeight(title:hybrid^8.0), product of: 8.0 =3D boost 1.4054651 =3D idf(docFreq=3D1) 0.049012046 =3D queryNorm 0.614891 =3D fieldWeight(title:hybrid in 0), product of: 1.0 =3D tf(termFreq(title:hybrid)=3D1) 1.4054651 =3D idf(docFreq=3D1) 0.4375 =3D fieldNorm(field=3Dtitle, doc=3D0) 0.015316265 =3D weight(content:hybrid^2.0 in 0), product of: 0.09802409 =3D queryWeight(content:hybrid^2.0), product of: 2.0 =3D boost 1.0 =3D idf(docFreq=3D2) 0.049012046 =3D queryNorm 0.15625 =3D fieldWeight(content:hybrid in 0), product of: 1.0 =3D tf(termFreq(content:hybrid)=3D1) 1.0 =3D idf(docFreq=3D2) 0.15625 =3D fieldNorm(field=3Dcontent, doc=3D0) 0.015316265 =3D weight(content:vehicle^2.0 in 0), product of: 0.09802409 =3D queryWeight(content:vehicle^2.0), product of: 2.0 =3D boost 1.0 =3D idf(docFreq=3D2) 0.049012046 =3D queryNorm 0.15625 =3D fieldWeight(content:vehicle in 0), product of: 1.0 =3D tf(termFreq(content:vehicle)=3D1) 1.0 =3D idf(docFreq=3D2) 0.15625 =3D fieldNorm(field=3Dcontent, doc=3D0) 2. Honda Civic 0.006126506 =3D product of: 0.012253012 =3D weight(content:vehicle^2.0 in 1), product of: 0.09802409 =3D queryWeight(content:vehicle^2.0), product of: 2.0 =3D boost 1.0 =3D idf(docFreq=3D2) 0.049012046 =3D queryNorm 0.125 =3D fieldWeight(content:vehicle in 1), product of: 1.0 =3D tf(termFreq(content:vehicle)=3D1) 1.0 =3D idf(docFreq=3D2) 0.125 =3D fieldNorm(field=3Dcontent, doc=3D1) 0.5 =3D coord(1/2) 3. Fuel blends: ethanol 0.017124103 =3D product of: 0.034248207 =3D weight(content:hybrid^2.0 in 2), product of: 0.09802409 =3D queryWeight(content:hybrid^2.0), product of: 2.0 =3D boost 1.0 =3D idf(docFreq=3D2) 0.049012046 =3D queryNorm 0.34938562 =3D fieldWeight(content:hybrid in 2), product of: 2.236068 =3D tf(termFreq(content:hybrid)=3D5) 1.0 =3D idf(docFreq=3D2) 0.15625 =3D fieldNorm(field=3Dcontent, doc=3D2) 0.5 =3D coord(1/2) However, when the search term includes a field (e.g. "title:hybrid content:vehicle"), boosting for the terms does not seem to work: Hits for "title:hybrid content:vehicle" were found in quotes by: 1. Big tax savings on hybrid vehicles 0.59159887 =3D sum of: 0.5010147 =3D weight(title:hybrid in 0), product of: 0.81480247 =3D queryWeight(title:hybrid), product of: 1.4054651 =3D idf(docFreq=3D1) 0.5797387 =3D queryNorm 0.614891 =3D fieldWeight(title:hybrid in 0), product of: 1.0 =3D tf(termFreq(title:hybrid)=3D1) 1.4054651 =3D idf(docFreq=3D1) 0.4375 =3D fieldNorm(field=3Dtitle, doc=3D0) 0.09058417 =3D weight(content:vehicle in 0), product of: 0.5797387 =3D queryWeight(content:vehicle), product of: 1.0 =3D idf(docFreq=3D2) 0.5797387 =3D queryNorm 0.15625 =3D fieldWeight(content:vehicle in 0), product of: 1.0 =3D tf(termFreq(content:vehicle)=3D1) 1.0 =3D idf(docFreq=3D2) 0.15625 =3D fieldNorm(field=3Dcontent, doc=3D0) 2. Fuel blends: ethanol 0.036233667 =3D product of: 0.072467335 =3D weight(content:vehicle in 1), product of: 0.5797387 =3D queryWeight(content:vehicle), product of: 1.0 =3D idf(docFreq=3D2) 0.5797387 =3D queryNorm 0.125 =3D fieldWeight(content:vehicle in 1), product of: 1.0 =3D tf(termFreq(content:vehicle)=3D1) 1.0 =3D idf(docFreq=3D2) 0.125 =3D fieldNorm(field=3Dcontent, doc=3D1) 0.5 =3D coord(1/2) Can you please suggest how to tackle the issue? Thanks Suman ------=_Part_18161_2423705.1147889303618 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Hi all,

I am ev= aluating Lucene=20 1.9 for a search application. I am using MultiFieldQueryParser for searchin= g across fields and everything works fine. However, we have a new requireme= nt where certain fields need to be boosted while searching. To complicate m= atters, users can specify fields while searching ( e.g. "title:hybrid content:vehicle").

I came across this = enhancement request ( http:= //issues.apache.org/bugzilla/show_bug.cgi?id=3D32115) that appears to a= ddress the boosting issue (Please see attached sample code to illustrate th= e problem. NewMultiFieldQueryParser is the enhanced version of MultiFieldQu= eryParser as per the enhancement request. I used=20 lucene-core-1.9.1.jar to test the code). For all simple searches (i.e. when= I don't mention a field while searching), it appears to work - explain out= put follows:

Hits for "hybrid v= ehicle" were found in quotes by:
1. B= ig tax savings on hybrid vehicles
0.36948532 =3D sum of:
  0.35416904 =3D sum of:
  =   0.33885276 =3D weight(title:hybrid^8.0 in 0), product of:
    = ;  0.5510778 =3D queryWeight(title:hybrid^8.0), product of:
       =20 8.0 =3D boost
    &nbs= p;   1.4054651 =3D idf(docFreq=3D1)
        0.049012046 =3D queryNorm=
      0.614891 =3D fiel= dWeight(title:hybrid in 0), product of:
    = ;    1.0 =3D tf(termFreq(title:hybrid)=3D1)
        1.4054651 =3D = idf(docFreq=3D1)
       = ; 0.4375 =3D fieldNorm(field=3Dtitle, doc=3D0)
    0.015316265 =3D weight(content:hybrid^2.0 in 0), product= of:
      0.0980= 2409 =3D queryWeight(content:hybrid^2.0), product of:
       = ; 2.0 =3D boost
        1.0 =3D idf(docFreq=3D2)<= br style=3D"font-family: courier new,monospace;">        0.0490= 12046 =3D queryNorm
      0.15625 =3D fieldWeight(content:hybrid in 0)= , product of:
    &nbs= p;   1.0 =3D tf(termFreq(content:hybrid)=3D1)
    = ;    1.0 =3D idf(docFreq=3D2)
        0.15625 =3D fieldNorm(field= =3Dcontent, doc=3D0)
  0.015316265 =3D weight(content:veh= icle^2.0 in 0), product of:
    0.09= 802409 =3D queryWeight(content:vehicle^2.0), product of:
     =20 2.0 =3D boost
    &nbs= p; 1.0 =3D idf(docFreq=3D2)
      0.049012046 =3D queryNorm
    0.15625 =3D fieldWeight(content:vehicle i= n 0), product of:
    = ;  1.0 =3D tf(termFreq(content:vehicle)=3D1)
      1.0 =3D idf(docFreq=3D2)
      0.15625 = =3D fieldNorm(field=3Dcontent, doc=3D0)

2. Honda Civic0.006126506 =3D product of:
  0.012253012 =3D = weight(content:vehicle^2.0 in 1), product of:
    0.09802409 =3D queryWeight(content:vehicle^2.0), product of:
      2.0 =3D boost
    = ;  1.0 =3D idf(docFreq=3D2)
 &nb= sp;    0.049012046 =3D queryNorm
    0.12= 5 =3D fieldWeight(content:vehicle in 1), product of:
      1.0 =3D tf(termFreq(content:vehicle)=3D1)
&nb= sp;     1.0 =3D idf(docFreq=3D2)
      0.125 =3D fieldNorm(field=3Dcontent, doc=3D1= )
  0.5 =3D coord(1/2)

3. Fuel blends: ethanol
0.017124103 =3D product of:
  0.03424= 8207 =3D weight(content:hybrid^2.0 in 2), product of:
    0.09= 802409 =3D queryWeight(content:hybrid^2.0), product of:
     =20 2.0 =3D boost
    &nbs= p; 1.0 =3D idf(docFreq=3D2)
      0.049012046 =3D queryNorm
    0.34938562 =3D fieldWeight(content:hybrid= in 2), product of:
    = ;  2.236068 =3D tf(termFreq(content:hybrid)=3D5)
      1.0 =3D idf(docFreq=3D2)
      0.15625 = =3D fieldNorm(field=3Dcontent, doc=3D2)
  0.5 =3D coord(1/2)



However, when the search term includes a field (e.g. "title:hybrid con= tent:vehicle"), boosting for the terms does not seem to work:
Hits for "title:hy= brid content:vehicle" were found in quotes by:
1. Big tax savings on hybrid vehicles
0.59= 159887 =3D sum of:
=   0.5010147 =3D weight(title:hybrid in 0), product of:
    0.81480247 =3D queryWeight(title:hybri= d), product of:
    = ;  1.4054651 =3D idf(docFreq=3D1)
&nb= sp;     0.5797387 =3D queryNorm
    0.614891 =3D fieldWeight(titl= e:hybrid in 0), product of:
      1.0 =3D tf(termFreq(title:hybrid)=3D1)

      1.4054651 =3D id= f(docFreq=3D1)
    = ;  0.4375 =3D fieldNorm(field=3Dtitle, doc=3D0)
  0.09058417 =3D weight(content:vehicle in 0), product of:
    0.5797387 =3D queryWei= ght(content:vehicle), product of:
      1.0 =3D idf(docFreq=3D2)
      0.5797387 =3D queryNorm=
    0.15625 =3D fieldWeight(content:vehicle in 0), product o= f:
      1.0 =3D = tf(termFreq(content:vehicle)=3D1)
    = ;  1.0 =3D idf(docFreq=3D2)
 &nb= sp;    0.15625 =3D fieldNorm(field=3Dcontent, doc=3D0)

2. Fuel blends: ethanol
0.036233667 =3D product= of:
  0.072467335 =3D weight(content= :vehicle in 1), product of:
    0.5797387 =3D queryWei= ght(content:vehicle), product of:
      1.0 =3D idf(docFreq=3D2)
      0.5797387 =3D queryNorm=
    0.125 =3D fieldWeight(content:vehicle in 1), product of:=
      1.0 =3D tf= (termFreq(content:vehicle)=3D1)
    = ;  1.0 =3D idf(docFreq=3D2)
 &nb= sp;    0.125 =3D fieldNorm(field=3Dcontent, doc=3D1)
  0.5 =3D coord(1/2)


Can you please suggest how to tackle the issue?<= /span>

Thanks

Suman ------=_Part_18161_2423705.1147889303618-- ------=_Part_18160_6598430.1147889303618 Content-Type: text/plain; charset=us-ascii --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org ------=_Part_18160_6598430.1147889303618--