hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harshit Mathur <mathursh...@gmail.com>
Subject Re: Smaller block size for more intense jobs
Date Wed, 13 May 2015 04:17:58 GMT
Hi Marko,

If your files are very small (less than the block size) then a lot of map
tasks will get executed, but as the initialization and overheads degrades
the overall performance, so it might appear that the single map is
executing very fast but the overall job execution will take more time.

I was having a similar problem where the data files were huge in number but
the size of a single file was much lesser than the block size, and due to
this a large number of maps were executed by the framework. This was taking
a great amount of time in overall job execution, so to overcome this issue,
we used Combined file input format, this handles the input split
efficiently and an optimum number of maps are executed, and thus the
overall job execution improves drastically.

Can you give some info about the size of data and the logic for processing
in the map function, it will help me understand your issue more.

BR,
Harshit Mathur

On Wed, May 13, 2015 at 1:27 AM, <marko.dinic@nissatech.com> wrote:

>  Hello,
>
>
>
> I'm in doubt should I specify the block size to be smaller than 64MB in
> case that my mappers need to do intensive computations?
>
>
>
> I know that it is better to have larger files, since the replication and
> NameNode as a weak point, but I'm don't have that much data, but the
> operations that need to be performed on it are intensive.
>
>
>
> It looks like it's better to have smaller block size (at least until there
> is more data) so that multiple Mappers get instantiated, so they could
> share the computations.
>
>
>
> I'm currently talking about Hadoop 1, not YARN. But a heads up about the
> same problem with YARN will be appreciated.
>
>
>
> Thanks,
>
> Marko
>
>
>   Sent with inky <http://inky.com?kme=signature>
>
>


-- 
Harshit Mathur

Mime
View raw message