hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Klimontovich (JIRA)" <j...@apache.org>
Subject [jira] Created: (HIVE-1419) Policy on deserialization errors
Date Sun, 20 Jun 2010 09:33:25 GMT
Policy on deserialization errors
--------------------------------

                 Key: HIVE-1419
                 URL: https://issues.apache.org/jira/browse/HIVE-1419
             Project: Hadoop Hive
          Issue Type: Improvement
          Components: Serializers/Deserializers
    Affects Versions: 0.5.0
            Reporter: Vladimir Klimontovich
            Assignee: Vladimir Klimontovich
            Priority: Minor
             Fix For: 0.5.1, 0.6.0


When deserializer throws an exception the whole map tasks fails (see MapOperator.java file).
It's not always an convenient behavior especially on huge datasets where several corrupted
lines could be a normal practice. Proposed solution:

1) Have a counter of corrupted records
2) When a counter exceeds a limit (configurable via hive.max.deserializer.errors property,
0 by default) throw an exception. Otherwise just log and exception with WARN level.

Patches for 0.5 branch and trunk are attached



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message