hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: KMeansBSP number of BSP tasks
Date Tue, 29 Jul 2014 06:41:29 GMT
Sorry for the inconvenience!

Since Kmeans example allow only text file as a input, I think you have
to create your own Kmeans job runner. Use KMeansBSP.prepareInput
instead of prepareInputText.

Please see http://svn.apache.org/repos/asf/hama/trunk/examples/src/main/java/org/apache/hama/examples/Kmeans.java

On Mon, Jul 28, 2014 at 9:33 PM, Giannis Giannakopoulos
<giannisgiannak@gmail.com> wrote:
> Ok then, how can I feed the KMeans job with multiple files as an input?
> When trying to creating a dir and putting inside all my input files, the
> job complains about the type of input (not a textfile) and exits.. Any
> thoughts on this?
>
>
> Thank you very much for your time,
> Giannis
>
> On 07/28/2014 03:30 PM, Edward J. Yoon wrote:
>>> , right? (meaning that the number of tasks is determined by the number
>>> of blocks of the input file).
>> Right.
>>
>> If you want to specify the number of tasks, you should have to adjust the size of
block, or write the multiple files as you want. See the KMeansBSP.prepareInput() and prepareInputText()
methods.
>>
>> --
>> Best Regards, Edward J. Yoon
>> Chief Executive Officer
>> DataSayer Co., Ltd.
>>
>> On Jul 28, 2014, at 5:31 PM, Giannis Giannakopoulos <giannisgiannak@gmail.com>
wrote:
>>
>>> Hello everyone,
>>>
>>> I am trying to run the kmeans clustering algorithm from the hama
>>> examples, but I face some problems. Specifically, I want to change the
>>> number of BSP tasks launched, something that is not possible through
>>> this
>>> <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hama/hama-examples/0.6.2/org/apache/hama/examples/Kmeans.java>
>>> , right? (meaning that the number of tasks is determined by the number
>>> of blocks of the input file).
>>>
>>> To this end, I tried to use the KmeansBSP
>>> <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hama/hama-ml/0.6.4/org/apache/hama/ml/kmeans/KMeansBSP.java#KMeansBSP.main%28java.lang.String[]%29>
>>> job which exports as a parameter the number of launched tasks but I
>>> can;t make it work :$. Specifically, I tried both text and sequence file
>>> input formats but th job is always failing with the message
>>>
>>> "Cannot create <name of input>; already exists as a directory"
>>>
>>> When putting a non-existing dir, I get the same message.
>>>
>>> Can someone please guide me through this? I want to run KMeans and I
>>> want to set the number of BSP tasks to launch (even if this means
>>> partitioning the input file -- I haven't found anything about thuis
>>> online regarding KMeans).
>>>
>>> Thank you in advance,
>>> Giannis
>>>
>>
>



-- 
Best Regards, Edward J. Yoon
CEO at DataSayer Co., Ltd.

Mime
View raw message