lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wun...@wunderwood.org>
Subject Re: Identify stopwords using TF-IDF
Date Sun, 23 Jun 2019 04:43:15 GMT
Don’t remove stopwords. That was a useful hack when we were running search engines on 16-bit
machines. These days, it causes more problems than it solves.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jun 22, 2019, at 8:14 PM, akash jayaweera <akash.jayaweera@gmail.com> wrote:
> 
> Hello All,
> I'm trying to identify stopwords for a non-English corpus using TF-IDF
> score. I calculated the score for each unique term in the corpus. But my
> question is how can I select stopwords using the score.
> For example if we have a corpus of football, term "football" get the lowest
> TF-IDF score. But for my requirement I don't want to identify "football" as
> a stopword.
> How can I clearly Identify stopword. Is there any other simple method to
> identify stopwords than TF-IDF score.
> 
> Regards,
> *Akash Jayaweera.*
> 
> 
> E akash.jayaweera@gmail.com <akash.jayaweera@gmail.com>
> M + 94 77 2472635 <+94%2077%20247%202635>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message