lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@lucene.com>
Subject Re: Dates and others
Date Mon, 01 Dec 2003 22:27:52 GMT
Dion Almaer wrote:
> Interesting.  I implemented an approach which boosted based on the number of months in
the past, and
> after tweaking the boost amounts, it seems to do the job. I do a fresh reindex every
night (since
> the indexing process takes no time at all... unlike our old search solution!)

If you're reindexing every night, then document boosting should work 
well.  The other approaches I mentioned would only be required if you 
can't afford to re-index frequently.

> I read content for the index from different sources. Sometimes the source gives me documents
loosely
> in date order, but not all of them. So, it seems that one of the other approaches should
be taken
> (adding a month/week field etc).  I should look more into the HitCollector and see how
it can help
> me.

I wouldn't bother to pursue these approaches.  Document boosting should 
work well for you, since you're reindexing.

> The other issue I have is that I would like to prioritize the title field.  At the moment
I am lazy
> and add the title to the body (contents = title + body) which seems to be OK... however
sometimes
> something that mentions the search term in the title should appear higher up in the pecking
order.
> 
> I am using the QueryParser (subclassed to disallow wildcards etc) to do the dirty work
for me.
> Should I get away from this and manage the queries myself (and run a Multi against the
title field
> as well as the contents?

A separate title field will solve this for you.  You can, as you 
suggest, boost title clauses at query time.  Alternately, you could 
boost title fields at index time, although that's less flexible.

Note that if you put titles in a separate field and search both the 
title and the body field then title matches will tend to be naturally 
boosted, since titles tend to be shorter than bodies and hence are less 
normalized.

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message