hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (HADOOP-3666) SequenceFile RecordReader should skip bad records
Date Mon, 30 Jun 2008 15:47:45 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Owen O'Malley resolved HADOOP-3666.
-----------------------------------

    Resolution: Duplicate

This is a duplicate of HADOOP-153.

> SequenceFile RecordReader should skip bad records
> -------------------------------------------------
>
>                 Key: HADOOP-3666
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3666
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.0
>            Reporter: Joydeep Sen Sarma
>
> Currently a bad record in a sequencefile leads to entire job being failed. the best workaround
is to skip an errant file manually (by looking at what map task failed).  This is a sucky
option because it's manual and because one should be able to skip a sequencefile block (instead
of entire file).
> While we don't see this often (and i don't know why this corruption happened) - here's
an example stack:
> Status : FAILED java.lang.NegativeArraySizeException
> 	at org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:96)
> 	at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:75)
> 	at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:130)
> 	at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1640)
> 	at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1712)
> 	at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:79)
> 	at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:176)
> Ideally the recordreader should just skip the entire chunk if it gets an unrecoverable
error while reading.
> This was the consensus in hadoop-153 as well (that data corruptions should be handled
by recordreaders) and hadoop-3144 did something similar for textinputformat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message