hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergio Pena <sergio.p...@cloudera.com>
Subject Re: Review Request 54393: HIVE-15361: INSERT dynamic partition on S3 fails with a MoveTask failure
Date Tue, 06 Dec 2016 20:04:49 GMT


> On Dec. 5, 2016, 11:11 p.m., Illya Yalovyy wrote:
> > itests/hive-blobstore/src/test/queries/clientpositive/insert_into_dynamic_partitions.q,
lines 15-16
> > <https://reviews.apache.org/r/54393/diff/1/?file=1576929#file1576929line15>
> >
> >     Why explicit ADD PARTITION statement is required? I think insert into will create
missing partitions.

That was part of the test I did when I found the issue. But it is not needed. I will remove
it just to keep clarity on the type of testing.


> On Dec. 5, 2016, 11:11 p.m., Illya Yalovyy wrote:
> > itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_dynamic_partitions.q,
line 25
> > <https://reviews.apache.org/r/54393/diff/1/?file=1576931#file1576931line25>
> >
> >     Will "SHOW PARTITIONS" be a good validation in this case?

That's a good one. I'll add it.


> On Dec. 5, 2016, 11:11 p.m., Illya Yalovyy wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java, line 1763
> > <https://reviews.apache.org/r/54393/diff/1/?file=1576938#file1576938line1763>
> >
> >     Should we take into account HIVE_BLOBSTORE_USE_BLOBSTORE_AS_SCRATCHDIR?

It's not necessary. This is just an optimization when two MoveTask are on BlobStore. The condition
in the method verifies that.

return condOutputPath.equals(linkedSourcePath)
        && BlobStorageUtils.isBlobStoragePath(conf, condInputPath)
        && BlobStorageUtils.isBlobStoragePath(conf, linkedTargetPath);
        
If the user has HIVE_BLOBSTORE_USE_BLOBSTORE_AS_SCRATCHDIR=false, then the condInputPath might
be on HDFS, and the merge won't happen.


> On Dec. 5, 2016, 11:11 p.m., Illya Yalovyy wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java, line 1784
> > <https://reviews.apache.org/r/54393/diff/1/?file=1576938#file1576938line1784>
> >
> >     Just note: s3 to s3 copy is *more* efficient than hdfs to s3 copy.

This is only merging two MoveWorks into one, so the hdfs to s3 copy does not apply here I
think. I changed the comment thought:
* This is an optimization for BlobStore systems to avoid doing two renames or copies that
are not necessary.


> On Dec. 5, 2016, 11:11 p.m., Illya Yalovyy wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java, lines 1885-1894
> > <https://reviews.apache.org/r/54393/diff/1/?file=1576938#file1576938line1885>
> >
> >     I feel like this logic should be inside "addDependentMoveTasks" method.

It is confusing. The "addDependentMoveTasks" is used to link a MoveTask to the desired task.

For instance: addDependentMoveTasks(moveTaskToLink, conf, moveOnlyMoveTask, dependencyTask);
The above method will link: 
  moveOnlyMoveTask -> moveTaskToLink -> otherChildTasks
  
  
But I don't want to do that with the optimization, I want to copy the moveTaskToLink child
tasks to moveOnlyMoveTask instead.
Like:
  moveOnlyMoveTask -> otherChildTasks


On Dec. 5, 2016, 11:11 p.m., Sergio Pena wrote:
> > Could you also add a test with a strict dynamic partitioning, like:
> > INSERT OVERWRITE TABLE t2 PARTITION(p1="1", p2) SELECT \*, c1 AS p2 FROM t1;

Thanks. I will add it.


- Sergio


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54393/#review158069
-----------------------------------------------------------


On Dec. 5, 2016, 9:56 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54393/
> -----------------------------------------------------------
> 
> (Updated Dec. 5, 2016, 9:56 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-15361
>     https://issues.apache.org/jira/browse/HIVE-15361
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Problem:
> - DynamicPartitionCtx and ListBucketingCtx objects weren't set on the new MoveWork created
when merging the two MoveWork objects from the ConditionalTask.
> 
> Solution
> - Set the DynamicPartitionCtx and ListBucketingCtx objects to the new MoveWork created
for the S3 optimization.
> 
> Other changes
> - Merge the MoveWork objects inside the createCondTask() method for better error handling.
> - Only merge the MoveWork related to the moveOnlyMoveTask. The MoveWork from the mergeAndMoveMoveTask
may cause other issues that are not correctly tested.
> - Two new private methods are added to check and merge the conditional input/output tasks
to the linked MoveWork.
> 
> 
> Diffs
> -----
> 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_into_dynamic_partitions.q
PRE-CREATION 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_into_table.q 25e2e7007ff539223d9244ca9822aa65d1441eb0

>   itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_dynamic_partitions.q
PRE-CREATION 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_table.q 846b2b113f09a74a3f05c13ffb56163e81dc1e8e

>   itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out
PRE-CREATION 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out fbb52c132a331aefe870264e035c397078f3c82e

>   itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out
9f575a66ecefc3933b16dff554bdcc1c1f6420ee 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out
PRE-CREATION 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out
c725c96cbb6b0374e67308a54204c7c25e827567 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java adc1188f09c8019a8aa60403d5813d6fa4509ceb

>   ql/src/java/org/apache/hadoop/hive/ql/plan/LoadDesc.java bcd3125ab4ad20c00fec565e5004ee200c0187d5

>   ql/src/java/org/apache/hadoop/hive/ql/plan/LoadFileDesc.java 9a868a04ce93d5c2ee75b5c6e96a1401cea93133

>   ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java 771a919ccd0bd75fe6197299ae057647ece89a7e

>   ql/src/java/org/apache/hadoop/hive/ql/plan/MoveWork.java 9f498c7fb88a7a9f77b8c6739c097a2b26b0c617

>   ql/src/test/org/apache/hadoop/hive/ql/optimizer/TestGenMapRedUtilsCreateConditionalTask.java
e6ec44504685bd9e53f158cc359b8a7b79fd0166 
> 
> Diff: https://reviews.apache.org/r/54393/diff/
> 
> 
> Testing
> -------
> 
> All itests/hive-blobstore tests run.
> 
> Added new blobstore tests:
> - insert_into_dynamic_partitions.q
> - insert_overwrite_dynamic_partitions.q
> 
> Waiting for HiveQA to run the rest of the q-tests.
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message