lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Stop words filter
Date Wed, 23 Jun 2010 06:35:47 GMT
Hi Vinicius,

You should read the Package-Level Docs:
http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/analysis/packa
ge-summary.html

To get the Token attributes, you have to add Attributes to your TokenStream
using addAttribute() and then you have easy access to the various attributes
of each token, when iterating with incrementToken().

If you want to program an own Tokenizer, start by inspecting a provided one
and do it similar. Also test cases for existing analyzers are a good way to
look into the usage. A good method to test TokenStream/Analyzers are in the
test-package's class BaseTokenStreamTestCase: assertTokenStreamContents(),
assertAnalyzesTo().

Also the Lucene In Action *2* book gives good examples.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Vinicius Carvalho [mailto:viniciusccarvalho@gmail.com]
> Sent: Wednesday, June 23, 2010 4:50 AM
> To: java-user@lucene.apache.org
> Subject: Stop words filter
> 
> Hello there! I've been using lucene as a Fult Text Search solution for
some
> time. And  although I'm familiar with Analyzers and Stemmers I never used
> them directly.
> 
> I'm testing a few experiments on Sentiment Analysis and our
> implementation needs to perform stemming and stop word removal. I
> thought using lucene built-in support to spare me some coding time.
> 
> Is there any example? I'm trying
> 
> TokenStream stream = analyzer.tokenStream("", new
> StringReader(inputStr));
> 
> Problem is that I could not find a way to get the result tokens. I was
> expecting something like stream.getTokens:Token[] :P
> 
> Could someone point me in the right direction?
> 
> Regards
> 
> --
> The intuitive mind is a sacred gift and the rational mind is a faithful
servant.
> We have created a society that honors the servant and has forgotten the
gift.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message