hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-3144) better fault tolerance for corrupted text files
Date Mon, 31 Mar 2008 23:52:24 GMT
better fault tolerance for corrupted text files

                 Key: HADOOP-3144
                 URL: https://issues.apache.org/jira/browse/HADOOP-3144
             Project: Hadoop Core
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.15.3
            Reporter: Joydeep Sen Sarma

every once in a while - we encounter corrupted text files (corrupted at source prior to copying
into hadoop). inevitably - some of the data looks like a really really long line and hadoop
trips over trying to stuff it into an in memory object and gets outofmem error. Code looks
same way in trunk as well .. 

so looking for an option to the textinputformat (and like) to ignore long lines. ideally -
we would just skip errant lines above a certain size limit.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message