lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Schwenker, Stephen" <SSchwen...@thestar.ca>
Subject RE: Date Boosting
Date Fri, 31 Mar 2006 15:28:41 GMT
Hey Erik,

I've got a few more questions.  As you've probably realized, I'm somewhat new to Lucene. 
We're using it to create a new search engine for our site and I've got a few last questions.

By the sounds of your first paragraph, I can calculate the date boost and set the date boost
on indexing, so I wrote a program to cycle through the documents and set the Field boost to
the value from the equation from before.  Then I write the documents to a new index.  Then
I search and compare/explain both indexes and the response is exactly the same on both.  It's
not taking into consideration the field boost on the field "pubdate".  Probably because the
search query doesn't match the pubdate field because it is a date and not necessarily content
to be searched.  Is there some thing I need to do to have the search always count the "pubdate"
field boost in the scoring?  Or should I continue the way I was going by creating a module
to do this dynamically?

I guess what I'm trying to do is document boosting.  But I don't really want to set the document
boost because I want to keep the boosting separate for different fields.  For example, Say
I have 3 fields pubdate, category, and source.  I want these 3 fields to be used to help rank
the content even though those 3 fields will not be used as one of the default search fields.
 I could take the field boost for each of those documents and calculate a document boost,
but that means every time that document is changed, I have to re-calculate that document boost
rather than just setting the new document boost.  I guess what I'm trying to do is separate
concerns of the different fields.

Maybe I'm going about this the wrong way.  If you think I am, let me know.  I now realize
that this question should be in the lucene users list but I started it here because I was
going to write a new module for doing this because I couldn't get lucene to do it for me.
 I'm going to look at the FunctionQuery now and see what it can do.

Thank you,


Steve.





-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
Sent: Thursday, March 30, 2006 2:10 PM
To: java-dev@lucene.apache.org
Subject: Re: Date Boosting



On Mar 30, 2006, at 11:28 AM, Schwenker, Stephen wrote:
> Maybe I'm not being quite clear enough.  I'm not simply looking to  
> boost a field by a fixed amount and it's not likely that the field  
> is going to match a word in the query because we won't be searching  
> for dates.  So that means that that fields boost will not be taken  
> into consideration because there is no match.

At indexing time, you can boost a document and/or any of its fields.   
Those factors are all multiplied together to form a single document- 
level boost factor.  So a field boosted at indexing time would not  
need to be part of the query itself to boost found documents.

> For example,  All my documents have a publication date.  And I want  
> newer documents to be ranked slightly higher than older documents.   
> Say I am searching for the word "Lucene" and it returns a list of  
> 10 documents.  The third one was written today and the first and  
> second documents were written 2 months ago but get ranked slightly  
> higher because of their score.  Each of these documents have a date  
> field("pubdate") which has the following values.
>
> 20060130
> 20060210
> 20060329
>
> Now, I want to turn these dates into numbers, multiply them by a  
> factor and add them to the total weight. e.g.
>
> Date -> (Numerical value(mil.) / (Convert to days) * (daily  
> multiply factor) = (Boost Result)
> 20060130 -> 1138597200000 / (1000 * 60 * 60 * 24 ) * 0.00002 =  
> 0.26356416...
> 20060210 -> 1139547600000 / (1000 * 60 * 60 * 24 ) * 0.00002 =  
> 0.26378416...
> 20060329 -> 1143608400000 / (1000 * 60 * 60 * 24 ) * 0.00002 =  
> 0.26472416...
>
> This is the current equation I'm hoping to use but I haven't quite  
> worked it out.  If I can add the boost result to the final score  
> then I'm hoping more recent articles will get a slightly higher  
> ranking.
>
> I hope you understand what I'm trying to accomplish and maybe you  
> can help me figure out where I should look.


Since you need to factor in some type of factor based on the  
difference of *today* and the publication date, I think FunctionQuery  
is perhaps what you're after.  It is part of Solr currently:

	<http://incubator.apache.org/solr/docs/api/org/apache/solr/search/ 
function/FunctionQuery.html>

Erik




>
> Thank you,
>
>
> Steve.
>
>
>
> -----Original Message-----
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> Sent: Thursday, March 30, 2006 9:51 AM
> To: java-dev@lucene.apache.org
> Subject: Re: Date Boosting
>
>
>
> On Mar 30, 2006, at 8:50 AM, Schwenker, Stephen wrote:
>> I'm new to Lucene and I want to make a query to dynamically boost a
>> document slightly based on a date field.  I'm not sure which
>> classes are used to calculate the boost, so I wanted to ask which
>> classes I should extend to accomplish this?  I'm just asking so I
>> can get to the job faster.  I don't want to waste my time looking
>> in places I don't need to.
>
> Extending classes is not necessary.  To boost a date field you can
> simply call Field.setBoost().   Use IndexSearcher.explain() to see
> how your boosts affect scoring.
>
> Since you might need dynamic data boosting, perhaps the new
> FunctionQuery would be more what you're after though?
>
> 	Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message