lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Appropriate analyzer
Date Tue, 28 Apr 2009 22:35:40 GMT

: try to use RegexQuery

Except that his input string is longer then the terms he wants to match 

It sounds like what you are looking for is essentially a simplified use 
case of the "longest matching sub-phrase" problem...

...except that you have the special case where (unless you simplified your 
example) you only care about the "longest matching prefix"

you could write an anaylyser that splits the input on each character, and 
then concats it's offset

Input: abcdef
Output: a_1, b_2, c_3, d_4, e_5, f_6 which case you use that analyzer when indexing; but at query time 
you use that anlyzer to build a BooleanQuery (instead of a PhraseQuery 
like QueryParser would do by default) and now a search for "abcdef" will 
match "abcde" with a higher score then "abcd" but it won't match "bcdef" 
at all.

Out of curiousity: what's your specific use case?  I've never heard of 
anyone wanting to match on something character-by-character like this 
(usually it's the reverse: people want "abcd" to match "abcde")

: > Actually, what I need is next: search on a query string step-by-step,
: > trimming last char on each step. Small example:
: > 
: > In index we've: abc, abcdef, xyz
: > When search on abcdefgh the most relevant result should be abcdef, while
: > searching on abcde the best one is abc.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message