mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (Created) (JIRA)" <>
Subject [jira] [Created] (MAHOUT-900) RandomSeedGenerator samples / output k texts incorrectly
Date Mon, 28 Nov 2011 10:42:40 GMT
RandomSeedGenerator samples / output k texts incorrectly

                 Key: MAHOUT-900
             Project: Mahout
          Issue Type: Bug
          Components: Clustering
    Affects Versions: 0.5
            Reporter: Sean Owen
            Assignee: Robin Anil
            Priority: Minor
             Fix For: 0.6
         Attachments: MAHOUT-900.patch

          int currentSize = chosenTexts.size();
          if (currentSize < k) {
          } else if (random.nextInt(currentSize + 1) == 0) { // with chance 1/(currentSize+1)
pick new element
            int indexToRemove = random.nextInt(currentSize); // evict one chosen randomly

The second "if" condition ought to be "!= 0", right? Only if it is 0 do we skip the body,
which removes an existing element, since the new element itself is evicted.

Second, this code:

        for (int i = 0; i < k; i++) {
          writer.append(chosenTexts.get(i), chosenClusters.get(i));

... assumes that at least k elements existed in the input, and fails otherwise. Probably need
to cap this.

Patch attached.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message