hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: MR job launching is slower
Date Tue, 20 Mar 2012 10:54:27 GMT
Hi,

First, it sounds like you have 2 6 core CPUs giving you 12 cores not 24. 
Even though the OS reports 24 cores that's the hyper threading and not real cores. 
This becomes an issue with respect to tuning. 

To answer your question ... 

You have a single 1TB HD. That's going to be a major bottleneck in terms of performance. 
You usually want to have at least 1 drive per core.  With a 12 core box that's 12 spindles.

How large is your hadoop job's jar? This gets pushed around to all of the nodes. 
Bigger jars take longer to process and handle. 

Having said that, the start up time isn't out of whack. 
It depends on what job you're launching and what you are doing within the job. Remember that
the tasks have to report back to the JT.

Do you have Ganglia up and running? 
You will probably see a high load on the CPUs and then a lot of Wait IOs. 

HTH

-Mike

On Mar 20, 2012, at 5:40 AM, praveenesh kumar wrote:

> I have 10 node cluster ( around 24 CPUs, 48 GB RAM, 1 TB HDD, 10 GB
> ethernet connection)
> After triggering any MR job, its taking like 3-5 seconds to launch ( I mean
> the time when I can see any MR job completion % on the screen).
> I know internally its trying to launch the job,intialize mappers, loading
> data etc.
> What I want to know - Is it a default/desired/expected hadoop behavior or
> there are ways in which I can decrease this startup time ?
> 
> Also I feel like my hadoop jobs should run faster, but I am still not able
> to make it as fast as it should be according to me ?
> I did some tunning also, following are the parameters I am playing around
> these days but still I feel there are something missing that I can still
> use:
> 
> dfs.block.size:
> 
> mapred.compress.map.output
> 
> mapred.map/reduce.tasks.speculative.execution
> 
> mapred.tasktracker.map/reduce.tasks.maximum:
> 
> mapred.child.java.opts
> 
> io.sort.mb:
> 
> io.sort.factor:
> 
> mapred.reduce.parallel.copies:
> 
> mapred.job.reuse.jvm.num.tasks:
> 
> 
> Thanks,
> Praveenesh


Mime
View raw message