lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: using different analyzer for searching
Date Fri, 01 Apr 2005 14:22:53 GMT

On Mar 31, 2005, at 10:49 PM, pashupathinath wrote:
>    i should do even more analysis as suggested by you
> before i should come to a decision of which analyser i
> should be using to solve this. what about writing a
> custom analyzer to solve this ??? how can i go abt the
> logic of implementing this in a custom analyzer..
> where this returns all the documents that has even a
> part of  the search string.
>    any insight into this would be very helpful
> especially in terms of performance wise.

This is an involved topic, and one that is covered in great detail in 
the analysis chapter of Lucene in Action (shameless plug, yes, I 
know!).

I recommend you analyze the types of queries that need to be made and 
what type of user interface you will present for this - then determine 
what makes the most sense analysis-wise.  WhitespaceAnalyzer is not 
going to be good enough, as I suspect you'll want case-insensitive 
searches at least.

	Erik


>
> thanks,
> pashupathinath.k
>
> --- Erik Hatcher <erik@ehatchersolutions.com> wrote:
>>
>> On Mar 31, 2005, at 11:44 AM, pashupathinath wrote:
>>
>>>   is it possible to index using a predefined
>> analyzer
>>> and search using a custom analyzer ??
>>
>> Yes, its perfectly fine to do so with the caveat
>> that you end up
>> searching for the terms exactly as they were
>> indexed.
>>
>> I end up doing this in most applications, actually,
>> primarily because
>> untokenized fields need to use the KeywordAnalyzer
>> during searching.
>>
>>>   i'm searching using the built in whitespace
>>> analyser. the problem is when i'm searching for a
>> part
>>> of a string the search results are zero.
>>>   i'm using white space analyzer. for example if
>> the
>>> statement is "my name is abc123" the search for
>> abc or
>>> 123 doesnt return any hits.
>>>   anyinsight into this ??
>>
>> The exact terms indexed using WhitespaceAnalyzer are
>> like this (using
>> the Lucene in Action AnalyzerDemo - "ant
>> AnalyzerDemo"):
>>
>>      [input] String to analyze: [This string will be
>> analyzed.]
>> my name is abc123
>>       [echo] Running lia.analysis.AnalyzerDemo...
>>       [java] Analyzing "my name is abc123"
>>       [java]   WhitespaceAnalyzer:
>>       [java]     [my] [name] [is] [abc123]
>>
>>       [java]   SimpleAnalyzer:
>>       [java]     [my] [name] [is] [abc]
>>
>>       [java]   StopAnalyzer:
>>       [java]     [my] [name] [abc]
>>
>>       [java]   StandardAnalyzer:
>>       [java]     [my] [name] [abc123]
>>
>> So you indexed "abc123" and searches must search for
>> that term
>> *exactly*.  You can search for "abc*" as a
>> PrefixQuery or WildcardQuery
>> and find "abc123".  "*123" will also find it though
>> QueryParser does
>> not support leading wildcard characters (but the API
>> does).  Wildcard
>> queries are not ideally what you want as it tends to
>> be much slower for
>> large indexes.
>>
>> You may need to do specialized analysis.  Perhaps
>> you could share you
>> real needs with the list and we could offer
>> recommendations.  It is
>> possible to index "abc123", "abc", and "123" all
>> within the same
>> position in the index if you do some clever analysis
>> and that meshes
>> with what you're after.
>>
>> 	Erik
>>
>>
>>
> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail:
>> java-user-help@lucene.apache.org
>>
>>
>
> Send instant messages to your online friends 
> http://uk.messenger.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message