hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Vyas <jayunit...@gmail.com>
Subject Re: partition as block?
Date Tue, 30 Apr 2013 19:04:11 GMT
Well, to be more clear, I'm wondering how hadoop-mapreduce can be optimized
in a block-less filesystem... And am thinking about application tier ways
to simulate blocks - i.e. by making the granularity of partitions smaller.

Wondering, if there is a way to hack an increased numbers of partitions as
a mechanism to simulate blocks - or wether this is just a bad idea
altogether :)




On Tue, Apr 30, 2013 at 2:56 PM, Mohammad Tariq <dontariq@gmail.com> wrote:

> Hello Jay,
>
>     What are you going to do in your custom InputFormat and partitioner?Is
> your InputFormat is going to create larger splits which will overlap with
> larger blocks?If that is the case, IMHO, then you are going to reduce the
> no. of mappers thus reducing the parallelism. Also, much larger block size
> will put extra overhead when it comes to disk I/O.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Wed, May 1, 2013 at 12:16 AM, Jay Vyas <jayunit100@gmail.com> wrote:
>
>> Hi guys:
>>
>> Im wondering - if I'm running mapreduce jobs on a cluster with large
>> block sizes - can i increase performance with either:
>>
>> 1) A custom FileInputFormat
>>
>>  2) A custom partitioner
>>
>> 3) -DnumReducers
>>
>> Clearly, (3) will be an issue due to the fact that it might overload
>> tasks and network traffic... but maybe (1) or (2) will be a precise way to
>> "use" partitions as a "poor mans" block.
>>
>> Just a thought - not sure if anyone has tried (1) or (2) before in order
>> to simulate blocks and increase locality by utilizing the partition API.
>>
>> --
>> Jay Vyas
>> http://jayunit100.blogspot.com
>>
>
>


-- 
Jay Vyas
http://jayunit100.blogspot.com

Mime
View raw message