lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Using Stemmers
Date Mon, 05 Mar 2007 22:43:57 GMT
Hi Mathieu,

You can't add TokenFilters to an existing Analyzer.  However,  
implementing an Analyzer that acts just like the StandardAnalyzer  
plus your Stemmer is pretty straightforward.   
StandardAnalzyer.tokenStream() looks like:
/** Constructs a {@link StandardTokenizer} filtered by a {@link
   StandardFilter}, a {@link LowerCaseFilter} and a {@link  
StopFilter}. */
   public TokenStream tokenStream(String fieldName, Reader reader) {
     TokenStream result = new StandardTokenizer(reader);
     result = new StandardFilter(result);
     result = new LowerCaseFilter(result);
     result = new StopFilter(result, stopSet);
//ADD your Stemming Filter here, or one line above if your Stop word  
list works off of stemmed words
     return result;
   }

So just create a new Analyzer that has these same filters, plus your  
stemming TokenFilter.  Looking at the source of SnowballAnalyzer  
(contrib/snowball) may also be useful.

FWIW, it is not that hard to make a "configurable" analyzer similar  
to what Solr does, if you find you need to change the filters in your  
analyzer a lot.

Cheers,
Grant


On Mar 5, 2007, at 1:25 PM, DECAFFMEYER MATHIEU wrote:

>
> Hi,
> This is a very simple question, but I just can't find the  
> ressources I need ...
> I am using the StandardAnalyzer :
> StandardAnalyzer stdAnalyzer;
> if ((stopWordList != null) && (stopWordList.length != 0)) {
> stdAnalyzer = new StandardAnalyzer(stopWordList);
> } else {
> stdAnalyzer = new StandardAnalyzer();
> }
> What I want to achive is be able to use an englsih stemmer,
> But I can't find any methods to associate my stemmer to my Analayzer.
> I appreciate any help, thank u.
>
> __________________________________
>
>    Mathieu Decaffmeyer
>    Web Developer
>    Fortis Banque Luxembourg
>    50, avenue J. F. Kennedy
>    L-2951 Luxembourg
>    IS Retail Banking - Web Content Management
>    Mobile : 0032  479 / 69 . 42 . 96
>
>
>
> ============================================
> Internet communications are not secure and therefore Fortis Banque  
> Luxembourg S.A. does not accept legal responsibility for the  
> contents of this message. The information contained in this e-mail  
> is confidential and may be legally privileged. It is intended  
> solely for the addressee. If you are not the intended recipient,  
> any disclosure, copying, distribution or any action taken or  
> omitted to be taken in reliance on it, is prohibited and may be  
> unlawful. Nothing in the message is capable or intended to create  
> any legally binding obligations on either party and it is not  
> intended to provide legal advice.
> ============================================
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ 
LuceneFAQ



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message