lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: SomeBody Help Me! To Find Out The Portuguese Analyzer
Date Wed, 07 Jan 2009 14:14:50 GMT

On Jan 6, 2009, at 7:26 PM, 이지홍 wrote:

> thanks for your answers.
> i'm sorry. my english writing is not good.
> i was told you. the Lucene SandBox Analyzer.
> you can find out.
> following url :
> http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/analyzers/src/java/org/apache/lucene/analysis/
> go there. you cand find out GermanAnalyzer and FranchAnalyzer.
>
> I will ask you repeat Time.
>
> Lucene SandBox Analyzer is What Diffrent From  SnowBallAnalyzer?

I would suggest looking at the code.  I haven't ever investigated them  
at a low-level.  If I had to guess, I bet they just have different  
approaches to how stemming is done.  Chances are neither is right or  
wrong and there is no such thing as a perfect stemmer.

If I were you, I would setup a small program that takes in some number  
of Strings from your documents in each of the languages and then runs  
them through each Analyzer, printing out the the tokens.  I have a  
_SAMPLE_ of this in my Lucene Boot Camp training code: http://www.lucenebootcamp.com/LuceneBootCamp/training/src/test/java/com/lucenebootcamp/training/basic/AnalyzerTest.java


>
>
> I don't know That.
>
> Which One Is Best?

Best for what?  It's going to depend.


>
>
> you can sure that snowball analyzer is covered english language?


Yes.

Analyzer analyzer = new SnowballAnalyzer("English");

>
>
> Plz Teach me.
>

Please have a look through more of the documentation and try some  
things out.

A simple:
  Analyzer analyzer = new //FILL IN YOUR ANALYZER HERE
  TokenStream stream = analyzer.tokenStream("foo", new  
StringReader("Test String Goes here"));
Token token = new Token();
     while ((token = tokenStream.next(token)) != null) {
       System.out.println("Token: " + token);
     }

will go a long way in your understanding of how these Analyzers work.


I am doing Lucene Boot Camp at ApacheCon in Amsterdam, Netherlands in  
March.  If you can't make that, I suggest you buy the most excellent  
"Lucene In Action" by Erik, Otis and Mike M. (http://www.manning.com/hatcher3 
).  Otherwise, there are plenty of tutorials and articles on using  
Lucene at http://wiki.apache.org/lucene-java/Resources and on the Wiki  
itself: http://wiki.apache.org/lucene-java/ which will cover how to  
use an analyzer.

You might also check out Solr's Admin UI, which has a built in way of  
outputting tokens to the screen given some user input in a text box.   
See the Solr project for more on that.

Good Luck,
Grant
Mime
View raw message