lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From G√ľnther Starnberger <...@sysfrog.org>
Subject max_score(multi_valued_field) function?
Date Tue, 02 May 2006 20:23:48 GMT
Hello,

I would like to use Lucene to index a set of articles, where several
different titles may belong to one single article. Currently I use a
field for the article as well as a multi-valued field for the titles.

My problem is:

- If I index only one of the titles I won't get matches when someone
searches for one of the other titles. Of course a part of the content
may match too, but as the title is shorter matches there will get a
higher score.

- If I index all of the possible titles in a multivalued field this
introduces some kind of noise and therefore also bad results. The
reason is that Lucene concatenates all the values of multi-valued
fields when searching them. While a single one of this fields may be a
perfect match this isn't the case when also indexing the alternative
titles.

I have come up with some (hackish) solutions to this problem like
indexing this alternative titles as whole new documents (together with
the content). Or by using different field-names for each title (e.g.
title01, title02, ...) and using a BooleanSearch to search on all
possible titles.

What I'm basically looking for is some way to not get the mean score
of a multi-valued field but the maximum score. Is there some more
elegant solution to implement this? I've thought of some things like
indexing multiple terms on the same position - but then there would
still be the problem that the length of the titles differs and that
this will also result in wrong combinations of the terms in the title.

Any suggestions on how to solve this problem?

bye,
/gst

Mime
View raw message