hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolae Marasoiu <nicolae.maras...@adswizz.com>
Subject map task frozen from master(s) perspective, but no process is there, and task log reports completion
Date Thu, 19 Nov 2015 07:37:13 GMT
Hi,

I have a map task "slot" occupied with a task that does not make progress for hours, and in
fact is seen by yarn as NEW and STARTING. (Since we use yarn / hadoop2, it is not a slot per-se,
but the resource mechanism works as dynamically computing slots - for instance I have top
5 map+reduce tasks running in current config. I cannot change this while the job is still
running right?)

I have found a log of the task shwn completion:
2015-11-19 04:01:14,719 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map
output
2015-11-19 04:01:14,719 INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output
2015-11-19 04:01:14,719 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend
= 63496; bufvoid = 104857600
2015-11-19 04:01:14,719 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 26214396(104857584);
kvend = 26201248(104804992); length = 13149/6553600
2015-11-19 04:01:14,851 INFO [main] org.apache.hadoop.mapred.MapTask: Finished spill 0
2015-11-19 04:01:14,858 INFO [main] org.apache.hadoop.mapred.Task: Task:attempt_1447872797537_0001_m_002241_0
is done. And is in the process of committing
2015-11-19 04:01:14,889 INFO [main] org.apache.hadoop.mapred.Task: Task 'attempt_1447872797537_0001_m_002241_0'
done.
2015-11-19 04:01:14,889 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping
MapTask metrics system...
2015-11-19 04:01:14,890 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask
metrics system stopped.
2015-11-19 04:01:14,890 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask
metrics system shutdown complete.

My hypothesis is that the task could not report its progress or completion to the application
master, but in this case the master should have timed it out I believe?
Can I kill the task attempt in any way to allow it to restart?

Pls advise,
Nicu

Mime
View raw message