lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From d-fader <>
Subject Re: Partial / starts with searching
Date Thu, 05 Feb 2009 09:39:39 GMT
I posted it in this list, because I thought it was more a development 
'issue', but thanks for the quick answer. I'll check out the ngrams and 
if needed, I'll repost my message in the users list. Thanks again!

Karl Wettin wrote:
> Hi Jori,
> your question is better suited the java-users lists, on this list we 
> discuss about developing the API.
> To answer your question, ngrams might solve your problem, tokenizers 
> are available in contrib/analyzers.
>         karl
> 5 feb 2009 kl. 10.19 skrev d-fader:
>> Hi,
>> I'm new to this list, so please don't be too harsh if I missed some 
>> rules or something. Since about half a year I'm using Lucene and I 
>> think it's awesome, respect for all your efforts!
>> Maybe the 'issue' I'm addressing now is discussed thouroughly 
>> already, in that case I think I need some redirection to the sources 
>> of those discussions :) Anyway, here's the thing.
>> For all I know it's impossible to search partial words with Lucene 
>> (except the asterix method with e.g. the StandardAnalyzer -> ambul* 
>> to find ambulance). My problem with that method is that my index 
>> consists of quite a few terms. This means that if a user would search 
>> for 'ambu amster' (ambulance amsterdam), there will be so many terms 
>> to search, it's not doable. Now I started thinking why it's 
>> impossible to search only a 'part' of a term or even only the 'start' 
>> of a term and the only reason I could think of was that the Index 
>> terms are stored tokenized (in that way you (of course) can't find 
>> partial terms, since the index actually doesn't contain the literal 
>> terms, but tokens instead). But Lucene can also store all terms 
>> untokenized, so in that case a partial search would be possible in my 
>> humble opinion, since all terms would be stored 'literally'.
>> Maybe my thinking is wrong, I only have a black box view of Lucene, 
>> so I don't know much about indexing algorithm and all, but I just 
>> want to know if this could be done or else why not :) You see, the 
>> users of my index want to know why they can't search parts of the 
>> words they enter and I still can't give them a really good answer, 
>> except the 'it would result in too many OR operators in the query' 
>> statement :)
>> Thanks in advance!
>> Jori
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message