hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: [jira] [Updated] (HAMA-834) Fix KMeans example
Date Sat, 04 Jan 2014 13:21:11 GMT
> But *LAUNCHED_TASK* stays always one?

The number of BSP task is determined by InputFormat. Basically, the number of tasks equals
to the number of blocks of single input file, or the number of multiple input files. So, you
can’t force the number of tasks without input partitioning.

Meanwhile, in GraphJob case, PartitioningRunner creates the partitions as user desired, it
runs before GraphJobRunner. So, you can set the number of tasks for a graph job.

On Jan 4, 2014, at 7:51 PM, Martin Illecker (JIRA) <jira@apache.org> wrote:

> 
>     [ https://issues.apache.org/jira/browse/HAMA-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
> 
> Martin Illecker updated HAMA-834:
> ---------------------------------
> 
>    Attachment: HAMA-834.patch
> 
> Please see the updated patch.
> You have to compare double values:
> {code}
> -      assertTrue(doubleVector.get(0) >= 50 && doubleVector.get(0) < 51);
> -      assertTrue(doubleVector.get(1) >= 50 && doubleVector.get(1) < 51);
> +      assertEquals(Double.valueOf(50), doubleVector.get(0));
> +      assertEquals(Double.valueOf(50), doubleVector.get(1));
> {code}
> 
> BTW why do we need 101 input vectors not 100?
> {code}
> -      for (int i = 0; i < 100; i++) {
> +      for (int i = 0; i < 101; i++) {
> {code}
> The resulting center of 100 input vectors would be (49.5, 49.5).
> 
> {quote}
> What do you mean exactly?
> {quote}
> 
> Finally I want to verify the result for a different amount of *NumBspTask*.
> Therefore I set the *NumBspTask* within TestKMeansBSP.
> {code}
> +      job.setNumBspTask(3);
> +      System.out.println("NumBspTask: " + job.getNumBspTask());
> {code}
> But *LAUNCHED_TASK* stays always one?
> 
> 
>> Fix KMeans example
>> ------------------
>> 
>>                Key: HAMA-834
>>                URL: https://issues.apache.org/jira/browse/HAMA-834
>>            Project: Hama
>>         Issue Type: Bug
>>         Components: examples, machine learning
>>   Affects Versions: 0.6.3
>>           Reporter: Martin Illecker
>>             Labels: example
>>            Fix For: 0.7.0
>> 
>>        Attachments: HAMA-834.patch
>> 
>> 
>> Fix problems in KMeans example and revise test case.
>> 1) Typo \[1] and input path issue
>> 2) Wrong *summationCount* in assignCentersInternal
>> *summationCount* should also be incremented if \[2] 
>> {code}
>> if (clusterCenter == null) {
>>  newCenterArray[lowestDistantCenter] = key;
>> }
>> {code}
>> Otherwise *summationCount* may stay zero when only one value is assigned. Then this
zero will be propagated to *incrementSum* \[3] and might cause a divide by zero in \[4]. 
>> By the way if we add three vectors and the *summationCount* would only be two, this
will lead to wrong results. Because later we are dividing the vector by the amount of increments.
>> 3) Results depend on the amount *numBspTask*
>> (results vary if *numBspTask* is changed)
>> \[1]
>> https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L518-519
>> \[2] https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L249
>> \[3]
>> https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L161
>> \[4] https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L172
> 
> 
> 
> --
> This message was sent by Atlassian JIRA
> (v6.1.5#6160)


Mime
View raw message