lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Generalized proximity query performance
Date Sun, 07 Oct 2007 22:18:58 GMT
: If I could intelligently rewrite queries, this would be better formulated
: as:
: title:"harry potter"~5 genre:books
: Instead, since I don't have that knowledge, I should perhaps rewrite several
: guesses, and take the dismax.  These guesses are equivalent to passing the

right.  okay.  the brute force approach of trying all possible 
permutations is really the only thing you can do unless you can think 
of ways to translate the "intelligence" that you would use to rewrite 
hte query into code.  One start: test each "word" against each field and 
see if the idf is unusually high, if it is then maybe it's a good idea to 
pull that word out of the phrase and use it to query that specific field 
... maybe you only do this on words at the beginign and end of the input?

the problem becomes a lot simpler when you write code specific to your 
domain .. if you know you are dealing with "products" and you hvae a 
"type" field that only ever contains 1 of 50 values which frequently 
appera in search input (ie: books, couch, dvd) then testing that field
first makes a lot of sense ... the problem becausem much ahrder when you 
want it to work on any generic index under the sun without knowing 
anything about the user behavior.

: This is rather slow.  The in the before/after, the numbers are in seconds,
: for one query, before and after this transformation has been made.

oh, well yeah ... no suprise there.  you can't compare benchmarks between 
two queries that do completley differnet things -- a "simple" query is 
probably always going to be faster the a more complex query that matches a 
different set of documents, or does a "better" job of scoring the same 
set as the simple query.  it's an apple and oranges thing.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message