hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deepak Jaiswal <djais...@hortonworks.com>
Subject Re: Review Request 65130: HIVE-18350 : load data should rename files consistent with insert statements
Date Wed, 17 Jan 2018 00:12:25 GMT


> On Jan. 16, 2018, 11:06 p.m., Jason Dere wrote:
> > - Why the need for the file name change even for non-bucketed tables?
> > - With this patch, what happens to existing tables (bucketed and non-bucketed) which
contain filenames that do not match the format used here?
> > - How are external tables handled, where the user is responsible for managing the
files in the table directories? For both bucketed and non-bucketed tables.

Will update the JIRA with details.


> On Jan. 16, 2018, 11:06 p.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
> > Lines 3951 (patched)
> > <https://reviews.apache.org/r/65130/diff/3/?file=1940449#file1940449line3951>
> >
> >     Add a comment about the assumptions/actions taken here - it looks like bucketed
files are assumed to be in the correct name format? Is there any validation that the 
> >     
> >     And all non-bucketed files are renamed to 000000_0? I wonder if this will ever
cause a problem with tons of files with 000000_0, 000000_0_copy_1, 000000_0_copy_2 etc. Though
I guess this is currently what happens when INSERT INTO TABLE occurs.

Bucketed files are validated in LoadSemanticAnalyzer itself. However, I can certainly add
an assert for that. Thanks for pointing this out.
Will add a comment as well.


> On Jan. 16, 2018, 11:06 p.m., Jason Dere wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
> > Lines 4034 (patched)
> > <https://reviews.apache.org/r/65130/diff/3/?file=1940449#file1940449line4034>
> >
> >     This check looks ugly, but if you have to do it then maybe do what is done in
SemanticAnalyzer.analyzeCreateTable() - .startsWith(HiveConf.getVar(conf, HiveConf.ConfVars.STAGINGDIR))
> >     
> >     Also, you are invoking toString() on an array (srcs) - use srcs[0].getPath().getName().
> >     
> >     Also please add a comment here to describe why this extra check is here.

Yes it sure does. I will use the example to make it better. Will add a comment as well.


- Deepak


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65130/#review195503
-----------------------------------------------------------


On Jan. 16, 2018, 9:31 a.m., Deepak Jaiswal wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65130/
> -----------------------------------------------------------
> 
> (Updated Jan. 16, 2018, 9:31 a.m.)
> 
> 
> Review request for hive, Eugene Koifman and Jason Dere.
> 
> 
> Bugs: HIVE-18350
>     https://issues.apache.org/jira/browse/HIVE-18350
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Made changes for both bucketed and non-bucketed tables.
> Added a positive test for non-bucketed table which renames the loaded file.
> Added couple of negative tests for bucketed table which reject a load with inconsistent
file name.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1a2b3c1f6c 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 4535c3edc2 
>   ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveCopyFiles.java cc1d8574b0 
>   ql/src/test/queries/clientnegative/load_data_bucketed_1.q PRE-CREATION 
>   ql/src/test/queries/clientnegative/load_data_bucketed_2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/load_data_rename.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/smb_mapjoin_7.q 4a6afb0496 
>   ql/src/test/results/clientnegative/load_data_bucketed_1.q.out PRE-CREATION 
>   ql/src/test/results/clientnegative/load_data_bucketed_2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/beeline/smb_mapjoin_7.q.out 7a6f8c53a5 
>   ql/src/test/results/clientpositive/llap/load_data_rename.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/smb_mapjoin_7.q.out b71c5b87c1 
>   ql/src/test/results/clientpositive/spark/smb_mapjoin_7.q.out ac49c02913 
> 
> 
> Diff: https://reviews.apache.org/r/65130/diff/3/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message