hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edward.y...@samsung.com>
Subject RE: Hama parition 1000 files on 3 tasks/machine
Date Tue, 26 May 2015 05:57:27 GMT
Yeah, that's also good alternative. User can directly access external 
resources (such as HDFS, NoSQL, and RDBMS) and partition data using messaging 
APIs.

However, I think we need to provide the solution at framework level.

--
Best Regards, Edward J. Yoon


-----Original Message-----
From: Chia-Hung Lin [mailto:clin4j@googlemail.com]
Sent: Tuesday, May 26, 2015 2:39 PM
To: user@hama.apache.org
Subject: Re: Hama parition 1000 files on 3 tasks/machine

An alternative thought:

In addition to the (key/ value) interface provided by Hama, each
process (within bsp function) should be able to read data from
external source with Reader related class; but processes may need to
use something like ZooKeeper for coordination.

FYI



On 26 May 2015 at 06:43, Edward J. Yoon <edward.yoon@samsung.com> wrote:
> Hi,
>
> Currently the task capacity of cluster should be larger than the number of
> blocks or files of input dataset. The alternative is to merge them into one
> file using hadoop fs -getmerge command.
>
> --
> Best Regards, Edward J. Yoon
>
> -----Original Message-----
> From: Behroz Sikander [mailto:behroz89@gmail.com]
> Sent: Tuesday, May 26, 2015 1:14 AM
> To: user@hama.apache.org
> Subject: Hama parition 1000 files on 3 tasks/machine
>
> Hi,
> I have a problem regarding data partitioning but was not able to find any
> solution online.
>
> Problem: I have around 1000 files that I want to process using Hama. Each
> file has the same schema/structure but different data. How can I divide
> these files in my cluster ? I mean if I have 3 tasks/machines then each
> task should process around 333 files.
>
> So,
> 1- How can I take thousand files as input in Hama ? With my current
> understanding, Hama will open 1000 tasks (1 task for each file)
> 2- How to divide the files on different machines (Custom Partitioner maybe
> )?
> 3- If this approach is not supported, then what can be an alternative
> approach of solving this ?Regards,
> Behroz Sikander
>
>



Mime
View raw message