hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Groschupf ...@media-style.com>
Subject io.skip.checksum.errors was: Re: Hung job
Date Sun, 12 Mar 2006 02:33:44 GMT
Hi Stack,

try set the io.skip.checksum.errors to true in your hadoop-site.xml

<property>
   <name>io.skip.checksum.errors</name>
   <value>false</value>
   <description>If true, when a checksum error is encountered while
   reading a sequence file, entries are skipped, instead of throwing an
   exception.</description>
</property>

This may be solve your problem but I agree that there should be a  
smarter way than just ignoring it. :)

Stefan


Am 10.03.2006 um 20:22 schrieb stack:

> On hadoop-users, I've described two recent hangs.  I'm writing here  
> to dev because I'm looking for pointers on how best to conjure a  
> fix with perhaps pointers to any existing facility I might exploit  
> (I do not know the codebase well).
>
> In synopsis the problem goes as follows:
>
> If a reduce cannot pick up map outputs -- for example, the output  
> has been moved aside because of a ChecksumException (See below  
> stack trace) -- then the job gets stuck with the reduce task trying  
> and failing every ten seconds or so to pick up the non-existent map  
> output part.
>
> Somehow the reduce needs to give up and the jobtracker needs to  
> rerun the map just as it would if the tasktracker had died completely.
>
> Thanks in advance for any pointers,
> St.Ack
>
>
> 060309 014426 Moving bad file /0/hadoop/tmp/task_m_bq2g76/ 
> part-20.out to
> /0/bad_files/part-20.out.2002824050
> 060309 014426 Server handler 0 on 50040 caught:
> org.apache.hadoop.fs.ChecksumException: Checksum error:
> /0/hadoop/tmp/task_m_bq2g76/part-20.out at 2649600
> org.apache.hadoop.fs.ChecksumException: Checksum error:
> /0/hadoop/tmp/task_m_bq2g76/part-20.out at 2649600
>      at
> org.apache.hadoop.fs.FSDataInputStream$Checker.verifySum 
> (FSDataInputStream.java:122)
>      at
> org.apache.hadoop.fs.FSDataInputStream$Checker.read 
> (FSDataInputStream.java:98)
>      at
> org.apache.hadoop.fs.FSDataInputStream$PositionCache.read 
> (FSDataInputStream.java:158)
>      at java.io.BufferedInputStream.read1(BufferedInputStream.java: 
> 254)
>      at java.io.BufferedInputStream.read(BufferedInputStream.java:313)
>      at java.io.DataInputStream.read(DataInputStream.java:80)
>      at org.apache.hadoop.mapred.MapOutputFile.write 
> (MapOutputFile.java:110)
>      at
> org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java: 
> 117)
>      at org.apache.hadoop.io.ObjectWritable.write 
> (ObjectWritable.java:64)
>      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:215)
>
>
>

---------------------------------------------
blog: http://www.find23.org
company: http://www.media-style.com



Mime
View raw message