hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Virajith Jalaparti <virajit...@gmail.com>
Subject "No space left on device" and "Could not find any valid local directory for taskTracker/jobcache/"
Date Thu, 23 Jun 2011 14:09:26 GMT
Hi,

I am trying to run a sort job (from hadoop-0.20.2-examples.jar) on 50GB of
data (generated using randomwriter). I am using hadoop-0.20.2 on a cluster
of 3 machines with one machine serving as the master and the other two as
slaves.
I get the following errors for various the task attempts:
=======================================================================
11/06/23 07:57:14 INFO mapred.JobClient: Task Id :
attempt_201106230747_0001_m_000119_0, Status : FAILED
Error: java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:282)
        at
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:190)
        at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
        at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
        at
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
        at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1298)
        at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:686)
        at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1173)

Error initializing attempt_201106230747_0001_m_000119_0:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
valid local directory for taskTracker/jobcache/job_201106230747_0001/job.xml
        at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:343)
        at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
        at
org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:750)
        at
org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1664)
        at
org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:97)
        at
org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1629)
=======================================================================

 The dfsadmin -report gives me the following:

==================================================================
Configured Capacity: 465230045184 (433.28 GB)
Present Capacity: 440799092736 (410.53 GB)
DFS Remaining: 371988148224 (346.44 GB)
DFS Used: 68810944512 (64.09 GB)
DFS Used%: 15.61%
Under replicated blocks: 1
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)

Name: 10.1.1.4:50010
Decommission Status : Normal
Configured Capacity: 232615022592 (216.64 GB)
DFS Used: 32243871744 (30.03 GB)
Non DFS Used: 12215377920 (11.38 GB)
DFS Remaining: 188155772928(175.23 GB)
DFS Used%: 13.86%
DFS Remaining%: 80.89%
Last contact: Thu Jun 23 08:04:51 MDT 2011


Name: 10.1.1.3:50010
Decommission Status : Normal
Configured Capacity: 232615022592 (216.64 GB)
DFS Used: 36567072768 (34.06 GB)
Non DFS Used: 12215574528 (11.38 GB)
DFS Remaining: 183832375296(171.21 GB)
DFS Used%: 15.72%
DFS Remaining%: 79.03%
Last contact: Thu Jun 23 08:04:51 MDT 2011

==================================================================



I have the following parameters configured in core-site.xml and
mapred-site.xml

*core-site.xml:*
<property>
  <name>hadoop.tmp.dir</name>
  <value>/mnt/local/mapred/</value>
</property>
</configuration>

*mapred-site.xml:*
    <name>mapred.system.dir</name>
    <value>/mnt/local/mapred/system</value>
  </property>

  <property>
    <name>mapred.local.dir</name>
    <value>/mnt/local/mapred/local</value>
  </property>

  <property>
    <name>mapred.temp.dir</name>
    <value>/mnt/local/mapred/temp</value>
  </property>

/mnt/ is on a local disk at each node in my cluster and it is just 17% full
with a total disk capacity of around 220GB. Each of the above directories
are created with read/write permissions.


I dont see why I am getting the "No space left on device" error from these
configurations. Any ideas how to solve this problem?

Thanks,
Virajith

Mime
View raw message