lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: incomplete word match
Date Thu, 11 Mar 2004 12:29:02 GMT
On Thursday 11 March 2004 06:15, Tomcat Programmer wrote:
> I have a situation where I need to be able to find
> incomplete word matches, for example a search for the
> string 'ape' would return matches for 'grapes'
> 'naples' 'staples' etc.  I have been searching the
> archives of this user list and can't seem to find any
> example of someone doing this.
>
> At one point I recall finding someone's site (on
> Google) who indicated that their search engine was
> Lucene, and they offered the capability of doing this
> type of matching. However I can't seem to find that
> site again to save my life!
>
> Has anyone been successful in implementing this type
> of matching with Lucene? If so, would you be able to
> share some insight as to how you did it?

I havn't actually done this, but I would make a first attempt
by indexing all the suffixes in a separate field and use a PrefixQuery
to search.  You would index eg. google as:
google oogle ogle gle le e
all on the same position. To search for substring ogl you
would query ogl* on the field.
To save space you might impose a minimum substring length.
The minimum query length should preferably be the same.
Your index will grow quite a bit, but it's difficult to say how much. 

You can do this by providing your own TokenStream on the field
that returns each substring as a Token with a getPositionIncrement()
of zero just after the the normal full Token (google) with an
increment of 1. See also:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/analysis/Token.html

Paul


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message