lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ard Schrijvers" <>
Subject Fastest way to perform 'like' searches
Date Wed, 08 Aug 2007 08:28:06 GMT

I need to do a search that is capable to also match on substrings, for example:

*oo bar the qu*

should find a document that contains 'foo bar the quux' and 'foo bar the qux'. Now, should
I index the text as UN_TOKENIZED also, and do a WildCardQuery on this field? Obviously, then
every blobtext is added as a single term in lucene. Clearly, this doesn't scale at all, and
searching becomes very slow. 

Does anybody know a more efficient way? A PhraseQuery might get me somewhere, isn't? Does
PhraseQuery allow wildcards in the phrase? But, as a phrase is analyzed according some analyzer
it might strip the 'the' as a stopword, implying that *oo bar qu* would also match, right?

I know the requirements is a little strange, but it is part of the JSR-170 specification (sql
'like' or xpath 'jcr:like' which mimics the sql like in db)

Thanks for any pointers 



Oosteinde 11
1017WT Amsterdam
The Netherlands
Tel  +31 (0)20 5224466
------------------------------------------------------------- / /

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message