hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <>
Subject [jira] Created: (HIVE-1968) data corruption with multi-table insert
Date Mon, 07 Feb 2011 20:02:57 GMT
data corruption with multi-table insert

                 Key: HIVE-1968
             Project: Hive
          Issue Type: Bug
          Components: Query Processor
    Affects Versions: 0.7.0
            Reporter: Joydeep Sen Sarma

i had to run a conversion process to compute a checksum (sum(hash(all-columns)) of a table
and convert it to a different compression format. trying to be clever - i did both of them
in a single pass by doing something to the equivalent of:

from (select col1, col2, hash(col1, col2) as val from table_to_be_converted) i
insert overwrite table table_to_be_generated select i.col1, i.col2
insert overwrite table table_to_be_converted_checksum select sum(hash(i.val));

the plan looked correct. however - the data produced was erroneous - the checksums and the
data were both wrong (and consistent with each other). i know this because:
- the checksum computed by the above query didn't match the checksum on the input table when
calculated separately
- the checksum of the data output by this query (first insert clause) didn't match the input
table's checksum (neither the one computed by the query above, nor by the one computed separately)

later on - i broke up this query into two independent ones - and the data and checksums were
good (ie. they all matched up). so seems like there's some data corruption happening in MTI.

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message