lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wettin <karl.wet...@gmail.com>
Subject Re: Partial / starts with searching
Date Thu, 05 Feb 2009 09:34:36 GMT
Hi Jori,

your question is better suited the java-users lists, on this list we  
discuss about developing the API.

To answer your question, ngrams might solve your problem, tokenizers  
are available in contrib/analyzers.


         karl

5 feb 2009 kl. 10.19 skrev d-fader:

> Hi,
>
> I'm new to this list, so please don't be too harsh if I missed some  
> rules or something. Since about half a year I'm using Lucene and I  
> think it's awesome, respect for all your efforts!
>
> Maybe the 'issue' I'm addressing now is discussed thouroughly  
> already, in that case I think I need some redirection to the sources  
> of those discussions :) Anyway, here's the thing.
> For all I know it's impossible to search partial words with Lucene  
> (except the asterix method with e.g. the StandardAnalyzer -> ambul*  
> to find ambulance). My problem with that method is that my index  
> consists of quite a few terms. This means that if a user would  
> search for 'ambu amster' (ambulance amsterdam), there will be so  
> many terms to search, it's not doable. Now I started thinking why  
> it's impossible to search only a 'part' of a term or even only the  
> 'start' of a term and the only reason I could think of was that the  
> Index terms are stored tokenized (in that way you (of course) can't  
> find partial terms, since the index actually doesn't contain the  
> literal terms, but tokens instead). But Lucene can also store all  
> terms untokenized, so in that case a partial search would be  
> possible in my humble opinion, since all terms would be stored  
> 'literally'.
>
> Maybe my thinking is wrong, I only have a black box view of Lucene,  
> so I don't know much about indexing algorithm and all, but I just  
> want to know if this could be done or else why not :) You see, the  
> users of my index want to know why they can't search parts of the  
> words they enter and I still can't give them a really good answer,  
> except the 'it would result in too many OR operators in the query'  
> statement :)
>
> Thanks in advance!
>
> Jori
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message