hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martin Illecker (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HAMA-834) Fix KMeans example
Date Thu, 09 Jan 2014 10:08:52 GMT

    [ https://issues.apache.org/jira/browse/HAMA-834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866474#comment-13866474
] 

Martin Illecker edited comment on HAMA-834 at 1/9/14 10:07 AM:
---------------------------------------------------------------

Problem when K = N
{code}
hama jar hama/hama-examples-0.7.0-SNAPSHOT.jar kmeans /tmp/kmeans/in /tmp/kmeans/out 5 5 -g
5 5
{code}

{code}
14/01/09 10:33:30 WARN fs.FSInputChecker: Problem opening checksum file: /tmp/kmeans/in/center/cen.seq.
 Ignoring exception: java.io.EOFException
	at java.io.DataInputStream.readFully(DataInputStream.java:180)
	at java.io.DataInputStream.readFully(DataInputStream.java:152)
	at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:134)
	at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
	at org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1499)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1479)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1474)
	at org.apache.hama.ml.kmeans.KMeansBSP.setup(KMeansBSP.java:91)
	at org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
	at org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:287)
	at org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
	at java.lang.Thread.run(Thread.java:662)
{code}


was (Author: bafu):
Problem when running in local mode:
{code}
hama jar hama/hama-examples-0.7.0-SNAPSHOT.jar kmeans /tmp/kmeans/in /tmp/kmeans/out 5 5 -g
5 5
{code}

{code}
14/01/09 10:33:30 WARN fs.FSInputChecker: Problem opening checksum file: /tmp/kmeans/in/center/cen.seq.
 Ignoring exception: java.io.EOFException
	at java.io.DataInputStream.readFully(DataInputStream.java:180)
	at java.io.DataInputStream.readFully(DataInputStream.java:152)
	at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:134)
	at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
	at org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1499)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1479)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1474)
	at org.apache.hama.ml.kmeans.KMeansBSP.setup(KMeansBSP.java:91)
	at org.apache.hama.bsp.LocalBSPRunner$BSPRunner.run(LocalBSPRunner.java:256)
	at org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:287)
	at org.apache.hama.bsp.LocalBSPRunner$BSPRunner.call(LocalBSPRunner.java:211)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
	at java.lang.Thread.run(Thread.java:662)
{code}

> Fix KMeans example
> ------------------
>
>                 Key: HAMA-834
>                 URL: https://issues.apache.org/jira/browse/HAMA-834
>             Project: Hama
>          Issue Type: Bug
>          Components: examples, machine learning
>    Affects Versions: 0.6.3
>            Reporter: Martin Illecker
>            Assignee: Martin Illecker
>              Labels: example
>             Fix For: 0.7.0
>
>         Attachments: HAMA-834.patch, HAMA-834_v02.patch, HAMA-834_v03.patch
>
>
> Fix problems in KMeans example and revise test case.
> 1) Typo \[1] and input path issue
> 2) Wrong *summationCount* in assignCentersInternal
> *summationCount* should also be incremented if \[2] 
> {code}
> if (clusterCenter == null) {
>   newCenterArray[lowestDistantCenter] = key;
> }
> {code}
> Otherwise *summationCount* may stay zero when only one value is assigned. Then this zero
will be propagated to *incrementSum* \[3] and might cause a divide by zero in \[4]. 
> By the way if we add three vectors and the *summationCount* would only be two, this will
lead to wrong results. Because later we are dividing the vector by the amount of increments.
> 3) Results depend on the amount *numBspTask*
> (results vary if *numBspTask* is changed)
> \[1]
> https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L518-519
> \[2] https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L249
> \[3]
> https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L161
> \[4] https://github.com/apache/hama/blob/trunk/ml/src/main/java/org/apache/hama/ml/kmeans/KMeansBSP.java#L172



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message