lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: max_score(multi_valued_field) function?
Date Wed, 03 May 2006 21:32:36 GMT
: > sub queries your self, and there's no easy way to make a query that
: > only searches the first "chunk" of terms from a field.
:
: I don't understand what you mean with "first 'chunk' of terms from a

In your specific case, i was refering to the first title of many multiple
titles if you *don't* use a seperate field for each -- once the text is
indexed, there is no notion of unique values, just a stream of tokens with
relative positions .. hence my poor choice of the word "chunk".

: Document 1 gets the higher score in this case (when I do a search on
: 'Amsterdam'). The reason is that the term 'Amsterdam' has a higher
: docFreq in Title-1 than in Title-2 (as Title-2 is only seldom used).

Hmmm ... i hadn't considered that.

You could override Similarity.idf(Term,Searcher) to heck if the termField
is one of the title variants, and if so use the sum of docFreq of the
termText accross all of the varient title fields.  My visceral reaction
to thinking of this idea was "ick" ... but i can't think of any legitimate
reasosn why this would be a bad idea.

: I think if I combine your .setOmitNorms(true) suggestion with this
: second suggestion it may actually work - that will be my next attempt.

Don't rule out the other idea i sent yesterday:  go back to using a
single title field with a high positionIncrimentGap between varient
titles, and marker tokens at the start/end of each title and make all of
your searches Phrase/SpanNear queries that include the start/end terms.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message