lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: AnalyZer HELP Please
Date Wed, 18 Aug 2004 10:17:23 GMT


On Aug 18, 2004, at 3:41 AM, Karthik N S wrote:
> Hi Guys
>
>   Finally with lot's experimentation, I came to know that
>
>     A word  such as  'new'  already present in  Analyzer,
>
>     will  not return  any hits [ Even when enclosed with Quotes "\""]
>
>     such as  "New Year"....
>
>
>    That's really Intresting....    :(


That's why it's call stop word *removal*.  The purpose of removing 
words is to save space and eliminate words that are ultra common.  
Tuning the analysis process to your domain/environment is by far the 
trickiest part of using Lucene, and often is not even much of a 
consideration as the built-in Analyzers suffice.  It sounds to me that 
your stop word list is far too aggressive and you should consider 
trimming down the list of words that are removed.

Or, even consider not removing words at all.  From the What Would 
Google Do (WWGD)? category.... does Google remove stop words?  I'll 
leave that as a rhetorical question for now :)

	Erik

>
>
> Thx
> Karthik
>
>
>
>
>
> -----Original Message-----
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> Sent: Tuesday, August 17, 2004 7:35 PM
> To: Lucene Users List
> Subject: Re: AnalyZer HELP Please
>
>
> On Aug 17, 2004, at 9:47 AM, Karthik N S wrote:
>> I did as Erik  replied in his mail ,
>> and  searched for the complete word   "\"New Year\""  ,
>> but the QueryParser Still returns me hit for "Year"  Only.
>>
>> [ The Analyzer I use has 555 English Stop words  with  "new" present
>> in it ]
>
> No wonder!
>
>> That's when I checked up with Analyzer's to verify,
>> If u look at the list  Analyzer's  o/p
>> GrammerAnalyzer is the one that has 555 English STOPWORDS.
>>
>> Do u think this is the bug in my Code.
>
> Whether this is a "bug" or not is really for your users to determine :)
>   But it is absolutely the expected behavior.  QueryParser analyzes the
> expression too.  Even if you somehow changed QueryParser, if you never
> indexed the word "new" then you certainly cannot expect to search on it
> and find it.
>
> 	Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message