lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: SomeBody Help Me! To Find Out The Portuguese Analyzer
Date Wed, 07 Jan 2009 14:14:50 GMT

On Jan 6, 2009, at 7:26 PM, 이지홍 wrote:

> thanks for your answers.
> i'm sorry. my english writing is not good.
> i was told you. the Lucene SandBox Analyzer.
> you can find out.
> following url :
> go there. you cand find out GermanAnalyzer and FranchAnalyzer.
> I will ask you repeat Time.
> Lucene SandBox Analyzer is What Diffrent From  SnowBallAnalyzer?

I would suggest looking at the code.  I haven't ever investigated them  
at a low-level.  If I had to guess, I bet they just have different  
approaches to how stemming is done.  Chances are neither is right or  
wrong and there is no such thing as a perfect stemmer.

If I were you, I would setup a small program that takes in some number  
of Strings from your documents in each of the languages and then runs  
them through each Analyzer, printing out the the tokens.  I have a  
_SAMPLE_ of this in my Lucene Boot Camp training code:

> I don't know That.
> Which One Is Best?

Best for what?  It's going to depend.

> you can sure that snowball analyzer is covered english language?


Analyzer analyzer = new SnowballAnalyzer("English");

> Plz Teach me.

Please have a look through more of the documentation and try some  
things out.

A simple:
  Analyzer analyzer = new //FILL IN YOUR ANALYZER HERE
  TokenStream stream = analyzer.tokenStream("foo", new  
StringReader("Test String Goes here"));
Token token = new Token();
     while ((token = != null) {
       System.out.println("Token: " + token);

will go a long way in your understanding of how these Analyzers work.

I am doing Lucene Boot Camp at ApacheCon in Amsterdam, Netherlands in  
March.  If you can't make that, I suggest you buy the most excellent  
"Lucene In Action" by Erik, Otis and Mike M. ( 
).  Otherwise, there are plenty of tutorials and articles on using  
Lucene at and on the Wiki  
itself: which will cover how to  
use an analyzer.

You might also check out Solr's Admin UI, which has a built in way of  
outputting tokens to the screen given some user input in a text box.   
See the Solr project for more on that.

Good Luck,
View raw message