mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAHOUT-647) Two small bugs in seq2sparse
Date Thu, 31 Mar 2011 20:27:05 GMT
Two small bugs in seq2sparse
----------------------------

                 Key: MAHOUT-647
                 URL: https://issues.apache.org/jira/browse/MAHOUT-647
             Project: Mahout
          Issue Type: Bug
          Components: Utils
    Affects Versions: 0.4
            Reporter: Vasil Vasilev
            Assignee: Sean Owen
            Priority: Minor
             Fix For: 0.5


>From Vasil on the mailing list:

1. the minLLR parameter is not taken into account. The problem is that in
the CollocDriver class
Job job = new Job(conf);

is executed before

conf.setFloat(LLRReducer.MIN_LLR, minLLRValue);

see CollocDriver.computeNGramsPruneByLLR method

2. maxDFPercent is not taken into account. The problem is that in
TFIDFPartialVectorReducer.reduce the check is

if (df / vectorCount > maxDfPercent) {
         if (log.isInfoEnabled()) {
               log.info("ommiting {}", e.index());
             }
       continue;
     }

and should be:

if (df*100 / vectorCount > maxDfPercent) {
         if (log.isInfoEnabled()) {
               log.info("ommiting {}", e.index());
             }
       continue;
     }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message