lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kyle Maxwell" <fizx.l...@gmail.com>
Subject Generalized proximity query performance
Date Thu, 04 Oct 2007 00:17:04 GMT
Hi again,As the subject would suggest I'm trying to implement a layer of
proximity weighting over lucene.  This has greatly increased search
relevance, but at the same time has knocked down performance by a
substantial amount (see footer).

I am using a hand rolled query of the following form (implemented with
SpanNearQuery, not a sloppy PhraseQuery):
a b c => +(a AND b AND c) OR "a b"~5 OR "b c"~5

The obvious solution, "a b c"~5, is not applicable for my issues, because I
would like to allow for the possibility that a and b are near each other in
one field, while c is in another field.

So, is there something I'm missing to make this performant?  Would a
reordering, query rewriting solution help?  If there's no solution in
existing Lucene, would anyone be interested in investigating options with
me?

-Kyle


Somewhat arbitrary benchmarks.
--------------
Before:
$ ./bench.rb "paris hilton"
0.022000   0.000000   0.022000 (  0.021000)
$ ./bench.rb "paris hilton goes to jail"
0.024000   0.000000   0.024000 (  0.024000)

After:
$> ./bench.rb "paris hilton"
0.103000   0.000000   0.103000 (  0.103000)
$> ./bench.rb "paris hilton goes to jail"
1.514000   0.000000   1.514000 (  1.513000)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message