lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joachim Schreiber" <yos...@web.de>
Subject Re: Similarity - position in Field[] effects scoring - how to change?
Date Tue, 23 Mar 2004 16:54:37 GMT
> On Tuesday 23 March 2004 16:05, Joachim Schreiber wrote:
> > Hallo,
> >
> > I run in following problem. Perhaps somebody can help me.
> >
> > I have a index with different ids in the same field
> > something like
> >
> > <s>00000000
> > <s>45678565
> > <s>87854546
> >
> > Situation: I have different documents with the entry <s>00000000 in the
> > same index.
> >
> >
> > document 1)
> >
> > <s>324235678565
> > <s>324dssd5678565
> > <s>45678324565
> > <s>00000000
> > <s>8785454324326
> >
> >
> > document 2)
> >
> > <s>324235678565
> > <s>00000000
> > <s>45678324565
> > <s>8785454324326
> >
> >
> >
> > when I search for "  s:00000000 "  I receive both docs, but document 1
has
> > a better scoring than document 2.
>
> Since the s field of document 2 is shorter, I'd expect document 2 to score
> higher. As mentioned, lengthNorm() is responsible for this.
> Something does not add up here. Are the documents in the same index?
>
> > The position of <s>00000000 in doc 1 is Field[4] and in doc 2 it's
> > Field[2], so this seems to effect scoring.
>
> Lucene's default scoring is independent of absolute term positions.
>

hm...

> > How can I disable this behaviour, so doc 1 has the same scoring as doc
2???
>
> Simply ignore the score. The easiest way is to use the low level scoring
API
> with your own HitCollector. Just make sure not to retrieve document field
> values until you collected all your hits.

you think its possible to order by e.g. date field without retrieving all
the values from the index??

>
> > Which method do I have to overwrite in DefaultSimilarity.
> > Has anybody any idea, any help.
>
> In which order to you want the resulting documents presented?
> The low level api gives them in index order when the query consists
> of single search term, afaik.

in index order is ok but not very flexibel

Regards,
yo

>
> Regards,
> Ype
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message