lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shi Hui Liu <shi...@nextbio.com>
Subject RE: Clarity: Is there a Query boosting 50-50 over 1000-1 ?
Date Fri, 29 Aug 2008 17:48:38 GMT
Sorry to mislead you. The query output like: +((TITLE:A | BODY:A) (TITLE:B | BODY:B)).
Let me explain it. Say S(A) means score of query A; TF(A) means term freq of A in the current
document.
Currently I'm using BooleanQuery to combine them to a query, if TF(A)=1000 and TF(B)=1 in
document D1, TF(A)=50 and TF(B)=50 in document D2, then usually D1 has higher score than D2.
But I want D2 show on top of D1 since D2 is more significant from our customer's point of
view. So my question is if there is an existing query can handle this case or I should create
a new query class. My idea is compute final score with S(A) * S(B) instead of S(A) + S(B)
like BooleanQuery. If I have to create my own query, could you give my some advices?

Thank you so much,
Shi Hui


-----Original Message-----
From: Grant Ingersoll [mailto:gsingers@apache.org]
Sent: Thursday, August 28, 2008 1:34 PM
To: java-user@lucene.apache.org
Subject: Re: Clarity: Is there a Query boosting 50-50 over 1000-1 ?

Can you share your query generation code?  Your description doesn't
make sense to me and I wonder how you are creating and running the
searches.

Can you run the explain() method on your documents?

Also, FWIW, it sounds like you are prematurely optimizing.   For every
query you ever do in your entire life (or until search engines are
capable of reading your mind, and even then, I'm not sure) on any
search system, you will be able to find at least one result that you
think is better than one that is ranked higher.  Now, if you told me
you had 50 queries that all scored badly, that is different.


On Aug 28, 2008, at 1:37 PM, Shi Hui Liu wrote:

> Hi Grant,
>
> Thank you for your help. My query is A AND B. The problem is if I
> use BooleanQuery, I got score 120 from TermQuery(A) and 0.5 from
> TermQuery(B) for the first article; for second article, I got score
> 27 from TermQuery(A) and 36 from TermQuery(B). From my point of
> view, I think the second article is better than the first one,
> however, the result is otherwise. I'm thinking to create a Query to
> multiply the score of A and B instead of sum. But I want to know if
> there is an existing query doing same thing since I'm still a new
> user to Lucene.
>
> Thank you in advance,
> Shi Hui
>
> -----Original Message-----
> From: Grant Ingersoll [mailto:gsingers@apache.org]
> Sent: Thursday, August 28, 2008 5:57 AM
> To: java-user@lucene.apache.org
> Subject: Re: Clarity: Is there a Query boosting 50-50 over 1000-1 ?
>
>
> On Aug 27, 2008, at 7:34 PM, Shi Hui Liu wrote:
>
>> Hi,
>>
>> I think I should clarify my question a little bit. I'm using
>> BooleanQuery to combine TermQuery(A) and TermQuery(B). But I'm not
>> satisfied with its scoring algorigthm. Is there other queries can
>> boost up the documents with 50 of A and 50 of B on top of documents
>> with 1000 of A and 1 of B?
>
> Is your query A + B meant to be A OR B or A AND B?  That is, are both
> terms required?  You notation suggests they are, but the description
> suggests you are getting documents that have only A in them, which
> suggests "OR".
>
> Have you looked at the explains?  What about the scoring aren't you
> happy with?  It's not perfect (there is no such thing) but it works
> pretty well in most cases, and works great if you spend a little time
> figuring out the right length normalization factors.
>
>
>> And I'm looking at the source code and found lots of classes are not
>> public and some important methods are protected. What's the reason?
>> Why make them public and let users to customize the Query easily?
>
> Because there not meant to be overridden, but of course we are open to
> specific suggestions on things that should be made public and often do
> this when someone shows a valid reason.
>
> Cheers,
> Grant
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ








---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message