hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vladimir Klimontovich (JIRA)" <>
Subject [jira] Created: (HIVE-1419) Policy on deserialization errors
Date Sun, 20 Jun 2010 09:33:25 GMT
Policy on deserialization errors

                 Key: HIVE-1419
             Project: Hadoop Hive
          Issue Type: Improvement
          Components: Serializers/Deserializers
    Affects Versions: 0.5.0
            Reporter: Vladimir Klimontovich
            Assignee: Vladimir Klimontovich
            Priority: Minor
             Fix For: 0.5.1, 0.6.0

When deserializer throws an exception the whole map tasks fails (see file).
It's not always an convenient behavior especially on huge datasets where several corrupted
lines could be a normal practice. Proposed solution:

1) Have a counter of corrupted records
2) When a counter exceeds a limit (configurable via hive.max.deserializer.errors property,
0 by default) throw an exception. Otherwise just log and exception with WARN level.

Patches for 0.5 branch and trunk are attached

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message