lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ype Kingma <>
Subject Re: Dates and others
Date Sun, 23 Nov 2003 19:02:59 GMT
On Saturday 22 November 2003 18:33, Dion Almaer wrote:
> 1. The power of dates:
>    I am fairly happy with the results of queries on my index.  The only
> issue I have is that at the moment the date of the content isn't considered
> (since lucene doesn't know about it).  Is there a good way in which the
> date of the content could be used to help with the scoring?  So more recent
> content shows up higher in the stack.  I have a date keyword field, but it
> isn't part of the query itself.  Are there any patterns to help with this?

You can use the Lucene date field, or use a keyword field eg. in yyyymmdd
format. However, Lucene's scoring is not based on the value of
a matching term, it's based on term frequencies in documents, on
the number of documents in the index containing the term, and
on the distance between terms (for proximity queries.)
You cannot make the document score depend directly on the value of 
a (date) field in the document.
Btw, how big would you want the date influence to be in the score?

Sorting results by date has been discussed in the past,  see the archives.
You lose the document scores in this case.

> 2. +field:foo and the QueryParser:
>    I ran into some problems where using +field:foo was giving strange
> results.  When I changed the queries to "... AND field:foo" everything was
> fine.
>    Am I missing something there?

Which version of Lucene are you using? There have been
some fixes in the query parser of Lucene 1.2, but I don't know 
precisely which.

> 3. I have some fields suck as title, owner, etc as well as the content blob
> which I index and use as the default search field.  Is there an easy way to
> extend the QueryParser to merge it with a MultiTermQuery which can also
> search this meta data and give them certain weights?  Or, if you go down

You can provide field weights at document indexing time (norms) and use a
MultiTermQuery for searching multiple fields. At query time you can
again use field weights.
I don't know how the scoring of the MultiTermQuery is done,
it might use the max. score over the fields of a document, or combine the
scores in the fields of a document.

> this path do you have to leave the QueryParser behind and build your own
> queries?  Any best practices would be great.

You have some options:
- create the MultiTermQuery from the query text, or
- index the default search field as a single field, eg. by concatenation, and
evt. by inserting empty tokens in between to avoid proximity matches.
This has also been discussed recently, see eg. the discussion on
indexing of sentences.

Searching mutliple fields is normally a little slower than searching a
concatenated field. The actual difference depends on you data, so
you might experiment a bit. You might eg. index all fields
seperately, and also index a default concatenated field.

Kind regards,
Ype Kingma

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message