hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashutosh Chauhan <hashut...@apache.org>
Subject Re: Review Request 50359: HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3
Date Fri, 29 Jul 2016 23:33:05 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50359/#review144220
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java (line 1803)
<https://reviews.apache.org/r/50359/#comment210221>

    I am not sure if this change is really needed. But, if it does, won't be need equivalent
in loadPartition() & loadDynamicPartitions().



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java (lines 1832 - 1833)
<https://reviews.apache.org/r/50359/#comment210220>

    I don't follow this. Comment doesn't seem to match code. 
    FileSystem.rename() should automatically do copy+delete for S3. So, why do we need to
do that explictly?
    Per your comment, you want to delete temp dir, but that should already be handled in Context::clear()
    Per your code, you are deleting preexisting files on target dir but as I said that should
already be handled in fs.rename()



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java (line 7019)
<https://reviews.apache.org/r/50359/#comment210222>

    This will go to temp s3 location. you may want to move this to hdfs too.


- Ashutosh Chauhan


On July 28, 2016, 8:11 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50359/
> -----------------------------------------------------------
> 
> (Updated July 28, 2016, 8:11 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-14270
>     https://issues.apache.org/jira/browse/HIVE-14270
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch will create a temporary directory for Hive intermediate data on HDFS when
S3 tables are used.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e92466f172c81fce20fe951df58f6561d28dc215

>   common/src/test/org/apache/hadoop/hive/common/TestBlobStorageUtils.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/Context.java ec5d693d28a40925c44f844a05ebf3f5c10173c9

>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 9d927bd1a519f79bc7fa88c3b7e5c6cc2ef0637f

>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 2671cb1cf2ef74f9d6628f8cdf3f5ac99283dbd8

>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestContext.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/50359/diff/
> 
> 
> Testing
> -------
> 
> NO PATCH
> ** NON-PARTITIONED TABLE
> 
> - create table dummy (id int);                                                      
                    3.651s
> - insert into table s3dummy values (1);                                             
                   39.231s
> - insert overwrite table s3dummy values (1);                                        
                   42.569s
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy; 
                   30.136s
> 
> EXTERNAL TABLE
> 
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';
      9.297s
> - insert into table s3dummy_ext values (1);                                         
                   45.855s
> 
> WITH PATCH
> 
> ** NON-PARTITIONED TABLE
> - create table s3dummy (id int) location 's3a://spena-bucket/user/hive/warehouse/s3dummy';
              3.945s
> - insert into table s3dummy values (1);                                             
                   15.025s
> - insert overwrite table s3dummy values (1);                                        
                   25.149s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummy' select * from dummy; 
                   19.158s      
> - from dummy insert overwrite table s3dummy select *;                               
                   25.469s      
> - from dummy insert into table s3dummy select *;                                    
                   14.501s
> 
> ** EXTERNAL TABLE
> - create table s3dummy_ext like s3dummy location 's3a://spena-bucket/user/hive/warehouse/s3dummy';
      4.827s
> - insert into table s3dummy_ext values (1);                                         
                   16.070s
> 
> ** PARTITIONED TABLE
> - create table s3dummypart (id int) partitioned by (part int)
>   location 's3a://spena-bucket/user/hive/warehouse/s3dummypart';                    
                    3.176s
> - alter table s3dummypart add partition (part=1);                                   
                    3.229s
> - alter table s3dummypart add partition (part=2);                                   
                    3.124s
> - insert into table s3dummypart partition (part=1) values (1);                      
                   14.876s
> - insert overwrite table s3dummypart partition (part=1) values (1);                 
                   27.594s     
> - insert overwrite directory 's3a://spena-bucket/dirs/s3dummypart' select * from dummypart;
            22.298s      
> - from dummypart insert overwrite table s3dummypart partition (part=1) select id;   
                   29.001s      
> - from dummypart insert into table s3dummypart partition (part=1) select id;        
                   14.869s
> 
> ** DYNAMIC PARTITIONS
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;       
                   15.185s
> - insert into table s3dummypart partition (part) select id, 1 from dummypart;       
                   18.820s
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message