lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anshum <ansh...@gmail.com>
Subject Re: Editing StopWordList
Date Tue, 21 Dec 2010 10:31:41 GMT
The default stopword list is final and so you'd have to copy this to some
other object and add terms to it instead of being able to directly modify
it. So all said and done, what I meant when I said you could use it was that
you could read that object and construct your own set (almost as case 2
below).


--
Anshum Gupta
http://ai-cafe.blogspot.com


On Tue, Dec 21, 2010 at 3:54 PM, manjula wijewickrema
<manjula53@gmail.com>wrote:

> Hi Gupta,
>
> Thanx a lot for your reply. But I could not understand whether I could
> modify (adding more words) to the default stop word list or should I have
> to
> make a new list as an array as follows.
> * public string[] NEW_STOP_WORDS = { "a", "and", "are", "as", "at", "be",
> "but", "by", "for", "if", "in", "into", "is", "no", "not", "of", "on",
> "or",
>
> "s", "such", "t", "that", "the", "their", "then", "there", "these", "they",
> "this", "to", "was", "will", "with",
> "inc","incorporated","co.","ltd","ltd.", "we", "you", "your", "us",
> etc...};
> *
> then call it as follows,
>
> SnowballAnalyzer analyzer = *new* SnowballAnalyzer("English",
> StopAnalyzer.NEW_STOP_WORDS );
> Am I correct?
> Or if not could you explain me how can I do this?
>
> Thanx in advance.
> Manjula.
>
> On Tue, Dec 21, 2010 at 10:36 AM, Anshum <anshumg@gmail.com> wrote:
>
> > Hi Manjula,
> > You could initialize the Analyzer using a modified stop word set. Use
> > the *StopAnalyzer.ENGLISH_STOP_WORDS_SET
> > *to get the default stopset and then add your own words to it. You could
> > then initialize the analyzer using this new stop set instead of the
> default
> > stop set.
> > Hope that helps.
> >
> > --
> > Anshum Gupta
> > http://ai-cafe.blogspot.com
> >
> >
> > On Tue, Dec 21, 2010 at 9:20 AM, manjula wijewickrema
> > <manjula53@gmail.com>wrote:
> >
> > > Hi,
> > >
> > > 1) In my application, I need to add more words to the stop word list.
> > > Therefore, is it possible to add more words into the default lucene
> stop
> > > word list?
> > >
> > > 2) If is it possible, then how can I do this?
> > >
> > > Appreciate any comment from you.
> > >
> > > Thanks,
> > > Manjula.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message