mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAHOUT-900) RandomSeedGenerator samples / output k texts incorrectly
Date Mon, 28 Nov 2011 10:42:40 GMT
RandomSeedGenerator samples / output k texts incorrectly
--------------------------------------------------------

                 Key: MAHOUT-900
                 URL: https://issues.apache.org/jira/browse/MAHOUT-900
             Project: Mahout
          Issue Type: Bug
          Components: Clustering
    Affects Versions: 0.5
            Reporter: Sean Owen
            Assignee: Robin Anil
            Priority: Minor
             Fix For: 0.6
         Attachments: MAHOUT-900.patch

          int currentSize = chosenTexts.size();
          if (currentSize < k) {
            chosenTexts.add(newText);
            chosenClusters.add(newCluster);
          } else if (random.nextInt(currentSize + 1) == 0) { // with chance 1/(currentSize+1)
pick new element
            int indexToRemove = random.nextInt(currentSize); // evict one chosen randomly
            chosenTexts.remove(indexToRemove);
            chosenClusters.remove(indexToRemove);
            chosenTexts.add(newText);
            chosenClusters.add(newCluster);
          }

The second "if" condition ought to be "!= 0", right? Only if it is 0 do we skip the body,
which removes an existing element, since the new element itself is evicted.

Second, this code:

        for (int i = 0; i < k; i++) {
          writer.append(chosenTexts.get(i), chosenClusters.get(i));
        }

... assumes that at least k elements existed in the input, and fails otherwise. Probably need
to cap this.

Patch attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message