hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <>
Subject [jira] [Updated] (HIVE-8637) In insert into X select from Y, table properties from X are clobbering those from Y
Date Tue, 28 Oct 2014 21:09:33 GMT


Alan Gates updated HIVE-8637:
    Attachment: HIVE-8637.patch

This is not a permanent fix.  This fix works by changing HiveInputFormat.getInputSplits to
call a new method in Utilities that sets values from table properties in the job conf whether
they are already set or not.  This seems safe, since the table should properly understand
its own properties.

I believe the correct long term solution is to make sure a different copy of JobConf goes
to the input and output tables, so each can write whatever it wants there.  I think that would
have to be done in ExecDriver.execute, since calls to checkOutputSpecs and getInputSplits
are done by Hadoop after Hive submits the job.  I think that would fix the MR case.  I'm sure
the fix for Tez would be slightly different (since the job is submitted all at once).

But this would also destroy any ability to communicate information across jobs via the conf
file.  I don't know if anything is doing that or not.  I'm loathe to make that big a change
when [~hagleitn] has said he wants to cut a release in a week.

So, I propose this smaller change now, and we file a JIRA for the bigger, more complete fix.

> In insert into X select from Y, table properties from X are clobbering those from Y
> -----------------------------------------------------------------------------------
>                 Key: HIVE-8637
>                 URL:
>             Project: Hive
>          Issue Type: Task
>    Affects Versions: 0.14.0
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>            Priority: Critical
>             Fix For: 0.14.0
>         Attachments: HIVE-8637.patch
> With a query like:
> {code}
> insert into table X select * from Y;
> {code}
> the table properties from table X are being sent to the input formats for table Y.

This message was sent by Atlassian JIRA

View raw message