lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Swanhart <greenl...@gmail.com>
Subject prefix wildcard matching options (*blah)
Date Thu, 04 Nov 2004 17:52:21 GMT
I'm thinking about making a seperate field in my index for prefix
wildcard searches.
I would chop off x characters from the front to create "subtokens" for
the prefix matches.

For the term: republican
terms created: republican epublican publican ublican blican

My query parser would then intelligently decide if their is a term
that has a wildcard as the first character of the term.  Instead of
searching the normal field, it would then remove the wildcard from the
start of the term and search on the prefix field instead.

A search for "*pub*" would be converted to "pub*" in the prefix field.  
A search for "*blican" would be converted to "blican"

Does this sound like an intelligent way to create fast prefix querying ability?

Can I index the prefix field with a seperate analyzer that makes the
prefix tokens, or should I just do the index-time expansion manually? 
I wouldn't need to search with this analyzer, just index with it,
because the searching doesn't have to expand all those terms.

If using a seperate analyzer for the prefix field makes more sense how
do I make a tokenizer that returns multiple tokens for one word?

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message