lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From d-fader <>
Subject Partial / starts with searching
Date Fri, 13 Feb 2009 08:05:40 GMT

I've actually posted this message in de dev mailing list earlier,
because I though my 'issue' is a limitation of the functionality of
Lucene, but they redirected me to this mailinglist, so I hope one of you
guys can help me out :)

Maybe the 'issue' I'm addressing now is discussed thouroughly already,
in that case I think I need some redirection to the sources of those
discussions :) Anyway, here's the thing.
For all I know it's impossible to search partial words with Lucene
(except the asterix method with e.g. the StandardAnalyzer -> ambul* to
find ambulance). My problem with that method is that my index consists
of quite a few terms. This means that if a user would search for 'ambu
amster' (ambulance amsterdam), there will be so many terms to search,
the waiting time is just inacceptable. Now I started thinking why it's
impossible to search only a 'part' of a term or even only the 'start' of
a term and the only reason I could think of was that the Index terms are
stored tokenized (in that way you (of course) can't find partial terms,
since the index doesn't actually contain the literal terms, but tokens
instead). But Lucene can also store all terms untokenized, so in that
case, in my humble opinion, a partial search would be possible, since
all terms would be stored 'literally'.

Maybe my thinking is wrong, I only have a black box view of Lucene, so I
don't know much about indexing algorithm and all, but I just want to
know if this could be done or else why not :) You see, the users of my
index want to know why they can't search parts of the words they enter
and I still can't give them a really good answer, except the 'it would
result in too many OR operators in the query' statement :) . I've tried
using a Dutch stemmer (most of the data I'm indexing is Dutch) but that
didn't work out quite good. Furthermore users sometimes search for a
certain 'filename' and mostly they just enter a part of the name and
thus don't find anything.

I hope someone can enlighten me :) Thanks in advance!


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message