lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Analyzer question
Date Mon, 08 Aug 2005 15:00:56 GMT
On Aug 8, 2005, at 10:43 AM, Dan Armbrust wrote:
> It is my understanding that the StandardAnalyzer will remove  
> underscores - so "some_word" be indexed as 'some' and 'word'.
>
> I want to keep the underscores, so I was thinking of changing over  
> to an Analyzer that uses the WhiteSpaceTokenizer, LowerCaseFilter,  
> and StopFilter.
>
> What other tokenizing magic will I lose by changing away from the  
> StandardAnalyzer?

The best thing you can do is set up a test environment to try out  
sample text with various analyzers.  Lucene in Action's source code  
(http://www.lucenebook.com) comes with such a demo that you can  
easily tweak.  Here's a sample of running "ant AnalyzerDemo":

      [echo] Running lia.analysis.AnalyzerDemo...
      [java] Analyzing "some_word"
      [java]   WhitespaceAnalyzer:
      [java]     [some_word]

      [java]   SimpleAnalyzer:
      [java]     [some] [word]

      [java]   StopAnalyzer:
      [java]     [some] [word]

      [java]   StandardAnalyzer:
      [java]     [some] [word]

      [java]   SnowballAnalyzer:
      [java]     [some] [word]

      [java]   SnowballAnalyzer:
      [java]     [some] [word]

      [java]   SnowballAnalyzer:
      [java]     [some] [word]

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message