lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nadav Har'El" <...@math.technion.ac.il>
Subject Re: Flexible index format / Payloads Cont'd
Date Fri, 30 Jun 2006 13:07:30 GMT
On Thu, Jun 29, 2006, Marvin Humphrey wrote about "Re: Flexible index format / Payloads Cont'd":
>   * Improve IR precision, by writing a Boolean Scorer that
>     takes position into account, a la Brin/Page '98.

Yes, I'd love to see that too (and it doesn't even require any new payloads
support, the positions that Lucene already has are enough).

I tried a small test using the Trec 8 corpus and query-relevance judgements,
and saw a noticable improvement in precision when I added a simplistic
version of this feature: I "or"ed the original query words with
SpanNearQuery's of each pair of words in the query, so the query of
"hot dog bun" will be converted to something similar to:

	hot OR dog OR bun OR "hot dog"~7^0.25 "dog bun"~7^0.25 "hot bun"~7^0.25

But this "solution" is obviously not the best we can do: it is inefficient
(goes through each posting list three times), and not tuned. A better solution
would be like you said, to create a modified version of BooleanQuery's
scoring.

-- 
Nadav Har'El                        |       Friday, Jun 30 2006, 4 Tammuz 5766
IBM Haifa Research Lab              |-----------------------------------------
                                    |Give Yogi a rifle. Support your right to
http://nadav.harel.org.il           |arm bears!

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message