lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suman Ghosh" <suman.ghos...@gmail.com>
Subject Boost factor in MultiFieldQueryParser
Date Wed, 17 May 2006 18:08:23 GMT
Hi all,

I am evaluating Lucene 1.9 for a search application. I am using
MultiFieldQueryParser for searching across fields and everything works fine.
However, we have a new requirement where certain fields need to be boosted
while searching. To complicate matters, users can specify fields while
searching (e.g. "title:hybrid content:vehicle").

I came across this enhancement request (
http://issues.apache.org/bugzilla/show_bug.cgi?id=32115) that appears to
address the boosting issue (Please see attached sample code to illustrate
the problem. NewMultiFieldQueryParser is the enhanced version of
MultiFieldQueryParser as per the enhancement request. I used
lucene-core-1.9.1.jar to test the code). For all simple searches (i.e. when
I don't mention a field while searching), it appears to work - explain
output follows:

Hits for "hybrid vehicle" were found in quotes by:
1. Big tax savings on hybrid vehicles
0.36948532 = sum of:
  0.35416904 = sum of:
    0.33885276 = weight(title:hybrid^8.0 in 0), product of:
      0.5510778 = queryWeight(title:hybrid^8.0), product of:
        8.0 = boost
        1.4054651 = idf(docFreq=1)
        0.049012046 = queryNorm
      0.614891 = fieldWeight(title:hybrid in 0), product of:
        1.0 = tf(termFreq(title:hybrid)=1)
        1.4054651 = idf(docFreq=1)
        0.4375 = fieldNorm(field=title, doc=0)
    0.015316265 = weight(content:hybrid^2.0 in 0), product of:
      0.09802409 = queryWeight(content:hybrid^2.0), product of:
        2.0 = boost
        1.0 = idf(docFreq=2)
        0.049012046 = queryNorm
      0.15625 = fieldWeight(content:hybrid in 0), product of:
        1.0 = tf(termFreq(content:hybrid)=1)
        1.0 = idf(docFreq=2)
        0.15625 = fieldNorm(field=content, doc=0)
  0.015316265 = weight(content:vehicle^2.0 in 0), product of:
    0.09802409 = queryWeight(content:vehicle^2.0), product of:
      2.0 = boost
      1.0 = idf(docFreq=2)
      0.049012046 = queryNorm
    0.15625 = fieldWeight(content:vehicle in 0), product of:
      1.0 = tf(termFreq(content:vehicle)=1)
      1.0 = idf(docFreq=2)
      0.15625 = fieldNorm(field=content, doc=0)

2. Honda Civic
0.006126506 = product of:
  0.012253012 = weight(content:vehicle^2.0 in 1), product of:
    0.09802409 = queryWeight(content:vehicle^2.0), product of:
      2.0 = boost
      1.0 = idf(docFreq=2)
      0.049012046 = queryNorm
    0.125 = fieldWeight(content:vehicle in 1), product of:
      1.0 = tf(termFreq(content:vehicle)=1)
      1.0 = idf(docFreq=2)
      0.125 = fieldNorm(field=content, doc=1)
  0.5 = coord(1/2)

3. Fuel blends: ethanol
0.017124103 = product of:
  0.034248207 = weight(content:hybrid^2.0 in 2), product of:
    0.09802409 = queryWeight(content:hybrid^2.0), product of:
      2.0 = boost
      1.0 = idf(docFreq=2)
      0.049012046 = queryNorm
    0.34938562 = fieldWeight(content:hybrid in 2), product of:
      2.236068 = tf(termFreq(content:hybrid)=5)
      1.0 = idf(docFreq=2)
      0.15625 = fieldNorm(field=content, doc=2)
  0.5 = coord(1/2)



However, when the search term includes a field (e.g. "title:hybrid
content:vehicle"), boosting for the terms does not seem to work:

Hits for "title:hybrid content:vehicle" were found in quotes by:
1. Big tax savings on hybrid vehicles
0.59159887 = sum of:
  0.5010147 = weight(title:hybrid in 0), product of:
    0.81480247 = queryWeight(title:hybrid), product of:
      1.4054651 = idf(docFreq=1)
      0.5797387 = queryNorm
    0.614891 = fieldWeight(title:hybrid in 0), product of:
      1.0 = tf(termFreq(title:hybrid)=1)
      1.4054651 = idf(docFreq=1)
      0.4375 = fieldNorm(field=title, doc=0)
  0.09058417 = weight(content:vehicle in 0), product of:
    0.5797387 = queryWeight(content:vehicle), product of:
      1.0 = idf(docFreq=2)
      0.5797387 = queryNorm
    0.15625 = fieldWeight(content:vehicle in 0), product of:
      1.0 = tf(termFreq(content:vehicle)=1)
      1.0 = idf(docFreq=2)
      0.15625 = fieldNorm(field=content, doc=0)

2. Fuel blends: ethanol
0.036233667 = product of:
  0.072467335 = weight(content:vehicle in 1), product of:
    0.5797387 = queryWeight(content:vehicle), product of:
      1.0 = idf(docFreq=2)
      0.5797387 = queryNorm
    0.125 = fieldWeight(content:vehicle in 1), product of:
      1.0 = tf(termFreq(content:vehicle)=1)
      1.0 = idf(docFreq=2)
      0.125 = fieldNorm(field=content, doc=1)
  0.5 = coord(1/2)


Can you please suggest how to tackle the issue?

Thanks

Suman

Mime
View raw message