hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From himanshu chandola <himanshu_cool...@yahoo.com>
Subject large reducer output with same key
Date Thu, 31 Dec 2009 10:10:10 GMT
Hi Everyone,
My reducer output results in most of the data having the same key. The reducer output is close
to 16 GB and though my cluster in total has a terabyte of space in hdfs I get errors like
the following :

> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:719)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:209)
>         at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)
> Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: 
> Could not find any valid local directory for 
> task_200808021906_0002_m_000014_2/spill4.out

After such failures, hadoop tries to start the same reduce job couple times on other nodes
before the job fails. From the
exception, it looks to me this is
probably a disk error(some machines have less than 16 gigs free space on hdfs).

So my question was whether hadoop puts values which share the same key as a single block in
one node ? Or something else
could be happening here ?




View raw message