hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Question on running simultaneous jobs
Date Thu, 10 Jan 2008 18:26:46 GMT
Joydeep Sen Sarma wrote:
> if the cluster is unused - why restrict parallelism? if someone's willing to wake up
at 4am to beat the crowd - they would just absolutely hate this.

[It would be better to make your comments in Jira. ]

But if someone starts a long-running job at night that uses the entire 
cluster then they could monopolize the cluster into the day.  If 
speculative execution is enabled, then some tasks could be killed to 
make room for other jobs are started in the morning, but that's not 
always possible.  And, if it's not, pickling a job's state and swapping 
it to HDFS would be expensive.

Note also that a task-limiting cluster cluster will still run faster at 
night.  If you've got 50 nodes with up to 200 tasks running at a time, 
then tasks will run faster when only 50 are running.  The network is 
also a primary bottleneck, and it will be less congested when fewer jobs 
are running, and disk contention will be lower too.  So night owls would 
still have significant advantages.

It's not intended as a perfect solution, but rather a substantial 
improvement for many users that's not too hard to implement.


View raw message