lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (LUCENE-1993) MoreLikeThis - allow to exclude terms that appear in too many documents (patch included)
Date Tue, 20 Oct 2009 12:02:00 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael McCandless resolved LUCENE-1993.
----------------------------------------

       Resolution: Fixed
    Fix Version/s: 3.0

Thanks Christian!

> MoreLikeThis - allow to exclude terms that appear in too many documents (patch included)
> ----------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1993
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1993
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 2.9
>            Reporter: Christian Steinert
>            Assignee: Michael McCandless
>             Fix For: 3.0
>
>         Attachments: MoreLikeThis.java.patch
>
>   Original Estimate: 0.17h
>  Remaining Estimate: 0.17h
>
> The MoreLikeThis class allows to generate a likeness query based on a given document.
So far, it is impossible to suppress words from the likeness query, that appear in almost
all documents, making it necessary to use extensive lists of stop words.
> Therefore I suggest to allow excluding words for which a certain absolute document count
or a certain percentage of documents is exceeded. Depending on the corpus of text, words that
appear in more than 50 or even 70% of documents can usually be considered insignificant for
classifying a document.      

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message