lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Performing a like query
Date Mon, 02 Oct 2006 17:46:42 GMT
: I have a custom-built Analyzer where I tokenize all non-whitespace
: characters as well available in the field "TERM" (which is the only
: field being tokenised).

: If I now query my index file for a term "6/12" for instance, I get back
: only ONE result

: instead of TWO. There is another token in the index file of the form
: 2561280012    0    163939000    R-eye=6/12 (finding)    0    3    en

it sounds like you are getting exactly what you should be since you
tokenize on whitespace if you want that second match to show up as well
then you probably need to tokenize on "=" as well, or maybe tokenize on
all punctuation and let a search for "6/12" become a phrase search for "6"
followed by "12" ... it's really just a question of what exactly do you
want toe be able to match on?

if you tuely want "substring" matching, so that searchs for "ye=6/" or
"12 (find" would both match that second record then you need to get more
creative ... lucene isn't designed for "substring" matching, it's
optimized for "term" matching.  substring matching is possible with
wildcard queries or index time tricks ... but those are really just hacks
to get arround a fundemental difference in purpose.

just because you have a great hammer, doesn't mean you should look at every problem as a
nail -- but i don't think that's exactly your problem -- i'm guessing you
just aren't used to thinking about your problem from a "term matching"
standpoint, you're probably really use to thinking in terms of DB queries
and LIKE an string matching ... don't think in terms of ways you'd solve
your problems with other tools .. ask yourself what you really want to be
ale to do what kidns of things you really want to match on, and then see
if lucene is the right tool for the job.



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message