hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From xeon Mailinglist <xeonmailingl...@gmail.com>
Subject Fwd: Submit, suspend and resume a mapreduce job execution
Date Sun, 21 Aug 2016 12:15:45 GMT
I know that it is not possible to suspend and resume mapreduce job, but I
really need to find a workaround. I have looked to the ChainedJobs and to
the CapacityScheduler, but I am really clueless on what to do.

The main goal was to suspend a job when the map tasks finish and the reduce
tasks start. I know that this is not possible, so I have created to jobs.
One that execute all the map tasks (Job 1), and another job that execute
all the reduce tasks (Job 2). Since I can't start a job with just running
reduce tasks, it was necessary to add an identity mapper before running the
reducers. So in the end, I have Job 1 that just executes all map tasks, and
job 2 that executes the identity mappers and the reduce tasks. But this
really kills performance. I wish I could find a way to obtain better
performance. I have thought in doing pipe of the output of Job 1 to Job 2,
but in the end I really need to stop the execution between these 2 jobs.

I have looked to the ChainedJobs and CapacityScheduler classes to see if I
could implement a way to suspend and resume a job, but I didn't do nothing
successfully. Any idea to emulate a way to suspend a job?

Sorry to say this, but I am really desperate in finding a solution.


On Wed, Feb 18, 2015 at 6:53 PM, Steve Loughran <stevel@hortonworks.com>

> Afraid not.
> When we suspend/resume a slider application, what we are doing is shutting
> down the entire application, releasing all its YARN resources and killing
> the "Application Master". The  MapReduce engine runs its AM for the
> duration of the job; building up lots of state in that AM as to what is
> happening. Tez runs for longer, but it can dynamically change cluster size
> based on load.
> "Hadoop pre-emption" is a mechanism by which your cluster can be set up so
> that higher priority workloads can cause containers of lower-priority jobs
> to get killed, "pre-empted". Maybe that could be useful.
> -Steve
> On 18 February 2015 at 17:22:57, xeonmailinglist (
> xeonmailinglist@gmail.com<mailto:xeonmailinglist@gmail.com>) wrote:
> Hi,
> I noticed that YARN does not suspend or resume a mapreduce job that it
> is executing. Then, I have found Apache Slider.
> Is it possible to submit a mapreduce job with slider, and suspend and
> resume the job while executing?
> Thanks,

View raw message