lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From akash jayaweera <akash.jayawe...@gmail.com>
Subject Identify stopwords using TF-IDF
Date Sun, 23 Jun 2019 03:14:37 GMT
Hello All,
I'm trying to identify stopwords for a non-English corpus using TF-IDF
score. I calculated the score for each unique term in the corpus. But my
question is how can I select stopwords using the score.
For example if we have a corpus of football, term "football" get the lowest
TF-IDF score. But for my requirement I don't want to identify "football" as
a stopword.
How can I clearly Identify stopword. Is there any other simple method to
identify stopwords than TF-IDF score.

Regards,
*Akash Jayaweera.*


E akash.jayaweera@gmail.com <akash.jayaweera@gmail.com>
M + 94 77 2472635 <+94%2077%20247%202635>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message