lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wettin <>
Subject Re: Partial / starts with searching
Date Fri, 13 Feb 2009 08:39:34 GMT
Hi again Jori,

did you try N-grams as suggested in the reply on -dev?


13 feb 2009 kl. 09.05 skrev d-fader:

> Hi,
> I've actually posted this message in de dev mailing list earlier,
> because I though my 'issue' is a limitation of the functionality of
> Lucene, but they redirected me to this mailinglist, so I hope one of  
> you
> guys can help me out :)
> Maybe the 'issue' I'm addressing now is discussed thouroughly already,
> in that case I think I need some redirection to the sources of those
> discussions :) Anyway, here's the thing.
> For all I know it's impossible to search partial words with Lucene
> (except the asterix method with e.g. the StandardAnalyzer -> ambul* to
> find ambulance). My problem with that method is that my index consists
> of quite a few terms. This means that if a user would search for 'ambu
> amster' (ambulance amsterdam), there will be so many terms to search,
> the waiting time is just inacceptable. Now I started thinking why it's
> impossible to search only a 'part' of a term or even only the  
> 'start' of
> a term and the only reason I could think of was that the Index terms  
> are
> stored tokenized (in that way you (of course) can't find partial  
> terms,
> since the index doesn't actually contain the literal terms, but tokens
> instead). But Lucene can also store all terms untokenized, so in that
> case, in my humble opinion, a partial search would be possible, since
> all terms would be stored 'literally'.
> Maybe my thinking is wrong, I only have a black box view of Lucene,  
> so I
> don't know much about indexing algorithm and all, but I just want to
> know if this could be done or else why not :) You see, the users of my
> index want to know why they can't search parts of the words they enter
> and I still can't give them a really good answer, except the 'it would
> result in too many OR operators in the query' statement :) . I've  
> tried
> using a Dutch stemmer (most of the data I'm indexing is Dutch) but  
> that
> didn't work out quite good. Furthermore users sometimes search for a
> certain 'filename' and mostly they just enter a part of the name and
> thus don't find anything.
> I hope someone can enlighten me :) Thanks in advance!
> Jori
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message