lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajiv2 <rajiv.roo...@gmail.com>
Subject Re: IDF scoring issue
Date Wed, 17 Dec 2008 03:43:43 GMT

To answer your questions,
1. there are only two words in the document I'm searching -- city and state
abbrev. lowercased and analyzed by whitespaceanalyzer
2. the only field and default field is text, so the query becomes text:
fleming text:roofing txt:inc. ...etc.
   Using query operator AND instead of OR gives me no results which does not
help.
3. I've been using explain in Luke and the only difference between "fleming
ga" and "marietta ga" is the idf value is higher for "flemming" ... that's
why "fleming ga" has a higher score.

Basically i'm just trying to get the "marietta ga" doc to score higher. In
the query text the two words are closer together than "fleming" and "ga".

rajiv



Erick Erickson wrote:
> 
> Note a couple of things:
> 
> 1> how a doc scores also takes into account how many other words
>      are in the field you're querying on.
> 2> Is "text" your default field? Because what you posted is really
>      searching text:fleming <default field>:roofing <default
> field>:inc......
>      Not also the implicit OR between each of them. Is this really your
>      intent?
> 3> query.explain (as i remember) is your friend to figure out how the
>      weights are being calculated. If you haven't got a copy of Luke, I'd
>     *strongly* advise getting one and looking at the "explain" tab...
> 
> Best
> Erick
> 
> On Tue, Dec 16, 2008 at 8:19 PM, Rajiv2 <rajiv.roopan@gmail.com> wrote:
> 
>>
>> Hello,
>>
>> I'm using the default lucene Queryparser on the search text : fleming
>> roofing inc., marietta ga
>>
>> These items are in my index.
>>
>> doc 1: fleming ga
>> doc 2: marietta ga
>> doc 3: marietta il
>> doc 4: marietta ok
>> doc 5: marietta ok
>> doc 6: fleming pa
>>
>> The first match is always "fleming ga" even though "marietta ga" is
>> closer
>> together in the search text. I'm assuming this is because of the
>> "fleming"
>> has a higher idf than marietta. What should I change in the way i'm
>> querying
>> or indexing to make this happen?
>>
>> Also, I don't want to modify the search text by putting quotes around
>> "marietta ga" which forces the query parser to make a phrase query.
>>
>> thanks,
>> Rajiv
>> --
>> View this message in context:
>> http://www.nabble.com/IDF-scoring-issue-tp21045385p21045385.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/IDF-scoring-issue-tp21045385p21046615.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message