hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj K (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-7130) Map Reduce Tasks are continously failing, when one among the several harddisks available on the TaskTracker fails.
Date Fri, 25 Feb 2011 14:27:44 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-7130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Devaraj K updated HADOOP-7130:
------------------------------

    Fix Version/s: 0.20.4
           Status: Patch Available  (was: Open)

Patch is provided for 0.20 branch.

> Map Reduce Tasks are continously failing, when one among the several harddisks available
on the TaskTracker fails.
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-7130
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7130
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.20.2, 0.20.3, 0.20-append
>            Reporter: Devaraj K
>             Fix For: 0.20.4
>
>         Attachments: HADOOP-7130.patch
>
>
> 1. Pull out one hard disk from Task tracker node (out of 10 disks pull one). Now it is
noted that some jobs are failing. 
> However process is continued. 
> 2. Wait for sometime (15 mins) and pull out one disk from another Task tracker. 
> 3. More number of jobs failed now and it can be seen from UI. Process is getting paused.
> The exception can be seen in the job tracker UI for a failed job.
> {code:xml} 
> Error initializing attempt_201010221528_10174_m_000011_0:
> java.io.IOException: Expecting a line not the end of stream
>  at org.apache.hadoop.fs.DF.parseExecResult(DF.java:110)
>  at org.apache.hadoop.util.Shell.runCommand(Shell.java:182)
>  at org.apache.hadoop.util.Shell.run(Shell.java:137)
>  at org.apache.hadoop.fs.DF.getAvailable(DF.java:74)
>  at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:385)
>  at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:134)
>  at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:113)
>  at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:835)
>  at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1790)
>  at org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:104)
>  at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1753)
> Error initializing attempt_201010221528_10174_m_000011_1:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local
directory for taskTracker/jobcache/job_201010221528_10174/work
>  at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:454)
>  at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:134)
>  at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:113)
>  at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:835)
>  at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1790)
>  at org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:104)
>  at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1753)
> {code} 
> Task Tracker log can be seen here :
> {code:xml} 
> 2010-10-25 16:36:24,215 ERROR mapred.TaskTracker (TaskTracker.java:offerService(1211))
- Caught exception: java.io.IOException: Expecting a line not the end of stream
>         at org.apache.hadoop.fs.DF.parseExecResult(DF.java:110)
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:182)
>         at org.apache.hadoop.util.Shell.run(Shell.java:137)
>         at org.apache.hadoop.fs.DF.getAvailable(DF.java:74)
>         at org.apache.hadoop.mapred.TaskTracker.getFreeSpace(TaskTracker.java:1586)
>         at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:1274)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1106)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1848)
>         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3022)
> 2010-10-25 16:36:24,216 INFO  mapred.TaskTracker (TaskTracker.java:run(1856)) - Lost
connection to JobTracker [/192.168.97.1:9001].  Retrying...
> java.lang.Exception: java.io.IOException: Expecting a line not the end of stream
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1212)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1848)
>         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3022)
> Caused by: java.io.IOException: Expecting a line not the end of stream
>         at org.apache.hadoop.fs.DF.parseExecResult(DF.java:110)
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:182)
>         at org.apache.hadoop.util.Shell.run(Shell.java:137)
>         at org.apache.hadoop.fs.DF.getAvailable(DF.java:74)
>         at org.apache.hadoop.mapred.TaskTracker.getFreeSpace(TaskTracker.java:1586)
>         at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:1274)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1106)
>         ... 2 more
> 2010-10-25 16:36:29,550 INFO  mapred.TaskTracker (TaskTracker.java:transmitHeartBeat(1256))
- Resending 'status' to '192.168.97.1' with reponseId '18361
> 2010-10-25 16:36:29,550 WARN  mapred.TaskTracker (TaskTracker.java:checkLocalDirs(2982))
- Task Tracker local can not create directory: /hdfsdata/0/mapred/local
> 2010-10-25 16:36:32,656 WARN  mapred.TaskTracker (TaskTracker.java:checkLocalDirs(2982))
- Task Tracker local can not create directory: /hdfsdata/0/mapred/local
> {code} 
> This seems to be fixed in the trunk.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message