hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martin Illecker (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HAMA-834) Fix KMeans example
Date Sat, 04 Jan 2014 10:51:50 GMT

     [ https://issues.apache.org/jira/browse/HAMA-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Martin Illecker updated HAMA-834:
---------------------------------

    Attachment: HAMA-834.patch

Please see the updated patch.
You have to compare double values:
{code}
-      assertTrue(doubleVector.get(0) >= 50 && doubleVector.get(0) < 51);
-      assertTrue(doubleVector.get(1) >= 50 && doubleVector.get(1) < 51);
+      assertEquals(Double.valueOf(50), doubleVector.get(0));
+      assertEquals(Double.valueOf(50), doubleVector.get(1));
{code}

BTW why do we need 101 input vectors not 100?
{code}
-      for (int i = 0; i < 100; i++) {
+      for (int i = 0; i < 101; i++) {
{code}
The resulting center of 100 input vectors would be (49.5, 49.5).

{quote}
What do you mean exactly?
{quote}

Finally I want to verify the result for a different amount of *NumBspTask*.
Therefore I set the *NumBspTask* within TestKMeansBSP.
{code}
+      job.setNumBspTask(3);
+      System.out.println("NumBspTask: " + job.getNumBspTask());
{code}
But *LAUNCHED_TASK* stays always one?


> Fix KMeans example
> ------------------
>
>                 Key: HAMA-834
>                 URL: https://issues.apache.org/jira/browse/HAMA-834
>             Project: Hama
>          Issue Type: Bug
>          Components: examples, machine learning
>    Affects Versions: 0.6.3
>            Reporter: Martin Illecker
>              Labels: example
>             Fix For: 0.7.0
>
>         Attachments: HAMA-834.patch
>
>
> Fix problems in KMeans example and revise test case.
> 1) Typo \[1] and input path issue
> 2) Wrong *summationCount* in assignCentersInternal
> *summationCount* should also be incremented if \[2] 
> {code}
> if (clusterCenter == null) {
>   newCenterArray[lowestDistantCenter] = key;
> }
> {code}
> Otherwise *summationCount* may stay zero when only one value is assigned. Then this zero
will be propagated to *incrementSum* \[3] and might cause a divide by zero in \[4]. 
> By the way if we add three vectors and the *summationCount* would only be two, this will
lead to wrong results. Because later we are dividing the vector by the amount of increments.
> 3) Results depend on the amount *numBspTask*
> (results vary if *numBspTask* is changed)
> \[1]
> https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L518-519
> \[2] https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L249
> \[3]
> https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L161
> \[4] https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L172



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message