lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ard Schrijvers" <a.schrijv...@hippo.nl>
Subject Fastest way to perform 'like' searches
Date Wed, 08 Aug 2007 08:28:06 GMT
Hello,

I need to do a search that is capable to also match on substrings, for example:

*oo bar the qu*

should find a document that contains 'foo bar the quux' and 'foo bar the qux'. Now, should
I index the text as UN_TOKENIZED also, and do a WildCardQuery on this field? Obviously, then
every blobtext is added as a single term in lucene. Clearly, this doesn't scale at all, and
searching becomes very slow. 

Does anybody know a more efficient way? A PhraseQuery might get me somewhere, isn't? Does
PhraseQuery allow wildcards in the phrase? But, as a phrase is analyzed according some analyzer
it might strip the 'the' as a stopword, implying that *oo bar qu* would also match, right?

I know the requirements is a little strange, but it is part of the JSR-170 specification (sql
'like' or xpath 'jcr:like' which mimics the sql like in db)

Thanks for any pointers 

Ard

-- 

Hippo
Oosteinde 11
1017WT Amsterdam
The Netherlands
Tel  +31 (0)20 5224466
-------------------------------------------------------------
a.schrijvers@hippo.nl / ard@apache.org / http://www.hippo.nl
-------------------------------------------------------------- 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message