hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (Created) (JIRA)" <>
Subject [jira] [Created] (HIVE-2658) add a option in hive to skip corrupted data entirely
Date Thu, 15 Dec 2011 22:01:31 GMT
add a option in hive to skip corrupted data entirely

                 Key: HIVE-2658
             Project: Hive
          Issue Type: New Feature
            Reporter: Namit Jain
            Assignee: He Yongqiang

Add a new parameter:

This is independent of the type of the underlying data.

The idea is as follows:

We have some corrupted data in our cluster right now.
We will run hive over all the corrupted partitions:

use bucketizedhiveinputformat

insert overwrite table <T> partition <P> 
select * from <T> where <P>

This way, <T>@<P> will be regenerated with all the data that can be read.

If HiveRecordReader gets a exception getting the next row, the mapper will behave as if no
more data is present in the file.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message