lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: max_score(multi_valued_field) function?
Date Tue, 02 May 2006 20:57:46 GMT

: - If I index all of the possible titles in a multivalued field this
: introduces some kind of noise and therefore also bad results. The
: reason is that Lucene concatenates all the values of multi-valued
: fields when searching them. While a single one of this fields may be a
: perfect match this isn't the case when also indexing the alternative
: titles.

i can think of two possibilities you might be refering to when you say
"noise" ... one is that the lengthNorm for docs with many variant titles
causes matches in those titles to not score as well as documents with only
one title -- this can be dealt with by overriding the lengthNorm function
so that when used on your title field, it returns a constant value (or by
index your title Field with setOmitNorms(true)) ... this will completley
eliminate the length of all titles from scoring consideration, but it is
an option.
The second thing you might be refering to is phrase matches that span
multiple titles, ie a phrase search for "Big Bang" might
inadvertently be matching a document with two titles: "How I Made it Big
Big", "Bang: The Story of One man rose in the ranks of organized crime"
... this can be solved by using an Analyzer that has a high
positionIncrementGap.

: What I'm basically looking for is some way to not get the mean score
: of a multi-valued field but the maximum score. Is there some more
: elegant solution to implement this? I've thought of some things like

As far as I can figure, your best bet really is to use a seperate field
for each title -- but instead of combining the queries into a
BooleanQuery, use a DisjunctionMaxQuery with a tiebreaker value of 0.0f
... the whole purpose of that query is to score documents based on the
maximum score of a sub query, but you still have to make the sub queries
your self, and there's no easy way to make a query that only searches the
first "chunk" of terms from a field.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message