hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: KMeansBSP number of BSP tasks
Date Tue, 29 Jul 2014 06:41:29 GMT
Sorry for the inconvenience!

Since Kmeans example allow only text file as a input, I think you have
to create your own Kmeans job runner. Use KMeansBSP.prepareInput
instead of prepareInputText.

Please see http://svn.apache.org/repos/asf/hama/trunk/examples/src/main/java/org/apache/hama/examples/Kmeans.java

On Mon, Jul 28, 2014 at 9:33 PM, Giannis Giannakopoulos
<giannisgiannak@gmail.com> wrote:
> Ok then, how can I feed the KMeans job with multiple files as an input?
> When trying to creating a dir and putting inside all my input files, the
> job complains about the type of input (not a textfile) and exits.. Any
> thoughts on this?
> Thank you very much for your time,
> Giannis
> On 07/28/2014 03:30 PM, Edward J. Yoon wrote:
>>> , right? (meaning that the number of tasks is determined by the number
>>> of blocks of the input file).
>> Right.
>> If you want to specify the number of tasks, you should have to adjust the size of
block, or write the multiple files as you want. See the KMeansBSP.prepareInput() and prepareInputText()
>> --
>> Best Regards, Edward J. Yoon
>> Chief Executive Officer
>> DataSayer Co., Ltd.
>> On Jul 28, 2014, at 5:31 PM, Giannis Giannakopoulos <giannisgiannak@gmail.com>
>>> Hello everyone,
>>> I am trying to run the kmeans clustering algorithm from the hama
>>> examples, but I face some problems. Specifically, I want to change the
>>> number of BSP tasks launched, something that is not possible through
>>> this
>>> <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hama/hama-examples/0.6.2/org/apache/hama/examples/Kmeans.java>
>>> , right? (meaning that the number of tasks is determined by the number
>>> of blocks of the input file).
>>> To this end, I tried to use the KmeansBSP
>>> <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hama/hama-ml/0.6.4/org/apache/hama/ml/kmeans/KMeansBSP.java#KMeansBSP.main%28java.lang.String[]%29>
>>> job which exports as a parameter the number of launched tasks but I
>>> can;t make it work :$. Specifically, I tried both text and sequence file
>>> input formats but th job is always failing with the message
>>> "Cannot create <name of input>; already exists as a directory"
>>> When putting a non-existing dir, I get the same message.
>>> Can someone please guide me through this? I want to run KMeans and I
>>> want to set the number of BSP tasks to launch (even if this means
>>> partitioning the input file -- I haven't found anything about thuis
>>> online regarding KMeans).
>>> Thank you in advance,
>>> Giannis

Best Regards, Edward J. Yoon
CEO at DataSayer Co., Ltd.

View raw message