hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: Task Priorities
Date Sat, 08 Dec 2012 00:48:52 GMT
Task is created per input split, and input splits are created one per
block of each input file by default. If block size is 60~200 MB, 1 ~
3GB memory per task is enough.

Yeah, there's still a queueing/messaging scalability issue as you
know. However, according to my experiences, message bundler and
compressor are mainly responsible for poor scalability and consumes
huge memory. This is more urgent than "queue".

On Sat, Dec 8, 2012 at 2:05 AM, Thomas Jungblut
<thomas.jungblut@gmail.com> wrote:
>>
>>  not disk-based.
>
>
> So how do you want to archieve scalability without that?
> In order to process tasks independend of each other (not in parallel, but
> e.g. in small mini batches), you have to save the state. RAM is limited and
> can't store huge states (persistent in case of crashes).
>
> 2012/12/7 Suraj Menon <surajsmenon@apache.org>
>
>> On Thu, Dec 6, 2012 at 8:27 PM, Edward J. Yoon <edwardyoon@apache.org
>> >wrote:
>>
>> > I think large data processing capability is more important than fault
>> > tolerance at the moment.
>> >
>>
>> +1
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Mime
View raw message