hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From " (JIRA)" <>
Subject [jira] [Commented] (HIVE-2117) insert overwrite ignoring partition location
Date Fri, 20 May 2011 16:28:47 GMT

] commented on HIVE-2117:

This is an automatically generated e-mail. To reply, visit:

Review request for hive and Carl Steinbach.


This change resolves a regression introduced by HIVE-1707, specifically that the partition
location (set via alter table partition location) is not being respected.

I addressed this by using the user specified location (as done originally), except in the
case with cross-filesystem moves (which was the concern in 1707).

This addresses bug HIVE-2117.


  ql/src/java/org/apache/hadoop/hive/ql/metadata/ bcacd35 
  ql/src/test/org/apache/hadoop/hive/ql/ PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/ 06a0447 
  ql/src/test/org/apache/hadoop/hive/ql/ PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/ 8c7c0b8 
  ql/src/test/queries/clientpositive/alter5.q PRE-CREATION 
  ql/src/test/results/clientpositive/alter5.q.out PRE-CREATION 



I added a new test which verifies partition location explicitly - as the existing tests ignore
this detail. This test failed w/o my fix applied, it passes with the fix applied.



> insert overwrite ignoring partition location
> --------------------------------------------
>                 Key: HIVE-2117
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.0, 0.8.0
>            Reporter: Patrick Hunt
>            Assignee: Patrick Hunt
>            Priority: Blocker
>         Attachments: HIVE-2117_br07.patch, HIVE-2117_br07.patch, HIVE-2117_trunk.patch,
> The following code works differently in 0.5.0 vs 0.7.0.
> In 0.5.0 the partition location is respected. 
> However in 0.7.0 while the initial partition is create with the specified location "<path>/parta",
the "insert overwrite ..." results in the partition written to "<path>/dt=a" (note that
<path> is the same in both cases).
> {code}
> create table foo_stg (bar INT, car INT); 
> load data local inpath 'data.txt' into table foo_stg;
> create table foo4 (bar INT, car INT) partitioned by (dt STRING) LOCATION '/user/hive/warehouse/foo4';

> alter table foo4 add partition (dt='a') location '/user/hive/warehouse/foo4/parta';
> from foo_stg fs insert overwrite table foo4 partition (dt='a') select *;
> {code}
> From what I can tell HIVE-1707 introduced this via a change to
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Path, String, Map<String, String>,
boolean, boolean)
> specifically:
> {code}
> +      Path partPath = new Path(tbl.getDataLocation().getPath(),
> +          Warehouse.makePartPath(partSpec));
> +
> +      Path newPartPath = new Path(loadPath.toUri().getScheme(), loadPath
> +          .toUri().getAuthority(), partPath.toUri().getPath());
> {code}
> Reading the description on HIVE-1707 it seems that this may have been done purposefully,
however given the partition location is explicitly specified for the partition in question
it seems like that should be honored (esp give the table location has not changed).
> This difference in behavior is causing a regression in existing production Hive based
code. I'd like to take a stab at addressing this, any suggestions?

This message is automatically generated by JIRA.
For more information on JIRA, see:

View raw message