hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bertrand Dechoux <decho...@gmail.com>
Subject Re: how to enhance job start up speed?
Date Mon, 13 Aug 2012 13:57:47 GMT
I am not sure to understand and I guess I am not the only one.

1) What's a worker in your context? Only the logic inside your Mapper or
something else?
2) You should clarify your cases. You seem to have two cases but both are
in overhead so I am assuming there is a baseline? Hadoop vs sequential, so
sequential is not Hadoop?
3) What are the size of the file?


On Mon, Aug 13, 2012 at 1:51 PM, Matthias Kricke <
matthias.mk.kricke@gmail.com> wrote:

> Hello all,
> I'm using CDH3u3.
> If I want to process one File, set to non splitable hadoop starts one
> Mapper and no Reducer (thats ok for this test scenario). The Mapper
> goes through a configuration step where some variables for the worker
> inside the mapper are initialized.
> Now the Mapper gives me K,V-pairs, which are lines of an input file. I
> process the V with the worker.
> When I compare the run time of hadoop to the run time of the same process
> in sequentiell manner, I get:
> worker time --> same in both cases
> case: mapper --> overhead of ~32% to the worker process (same for bigger
> chunk size)
> case: sequentiell --> overhead of ~15% to the worker process
> It shouldn't be that much slower, because of non splitable, the mapper
> will be executed where the data is saved by HDFS, won't it?
> Where did those 17% go? How to reduce this? Did hadoop needs the whole
> time for reading or streaming the data out of HDFS?
> I would appreciate your help,
> Greetings
> mk

Bertrand Dechoux

View raw message