hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Krishna Rao <krishnanj...@gmail.com>
Subject Job froze for hours because of an unresponsive disk on one of the task trackers
Date Thu, 27 Mar 2014 09:22:48 GMT

we have a daily Hive script that usually takes a few hours to run. The
other day I notice one of the jobs was taking in excess of a few hours.
Digging into it I saw that there were 3 attempts to launch a job on a
single node:

Task Id Start Time Finish Time
task_201312241250_46714_r_000048 Error launching task
task_201312241250_46714_r_000049 Error launching task
task_201312241250_46714_r_000050 Error launching task

I later found out that this node had a dodgy/unresponsive disk (still being
tested right now).

We've seen tasks fail in the past, but re-submitted to another node and
succeeding. So, shouldn't this task have been kicked off on another node
after the first failure? Is there anything I could be missing in terms of
configuration that should be set?

We're using CDH4.4.0.



View raw message