lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Isakson" <Eric.Isak...@sas.com>
Subject Partial token matches
Date Wed, 26 Apr 2006 16:20:19 GMT
Hi All,

Just wanted to throw out something I'm working on. It is working well for me, but I wanted
to see if anyone can suggest any other alternatives that might perform better than what I'm
doing now.

I have a field in my index that contains keywords (back of the book index terms) and a UI
feature that allows the user to find documents that contain a partial keyword supplied by
the user. So a particular document in my index might have the token "informat" in the keywords
field and the user may supply "form" in the UI and I should get a match.

My old implementation does not use Lucene and just uses String.matches with a regular expression
that looks like ".*form.*". I reimplemented using Lucene and just tokenize the field so I
get the tokens

informat
nformat
format
ormat
rmat
mat
at
t

Then I use a prefix query to find hits. Both implementations ignore case in the search and
the hit order is controlled by another field that I'm sorting on, so relevance ranking is
not important in this use case. Search time performance is crucial, time to create the index
and index size are not really important. The index is created statically at application startup
or possibly delivered to the application and is not updated while the application is using
it.

Thanks for any suggestions,
Eric

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message