hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergey Shelukhin <ser...@hortonworks.com>
Subject Re: Review Request 62098: HIVE-17403: Fail concatenation for unmanaged and transactional tables
Date Wed, 06 Sep 2017 00:30:36 GMT


> On Sept. 6, 2017, 12:08 a.m., Sergey Shelukhin wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java
> > Line 231 (original), 256 (patched)
> > <https://reviews.apache.org/r/62098/diff/1/?file=1815918#file1815918line256>
> >
> >     hmm... is this necessary?
> >     1) how does this interact with Hive duplicate file detection that can potentially
happen later?
> >     2) on the cloud, renaming files is very slow (one of the main reasons for MM
tables). We should not rename unless it's really needed.
> 
> Prasanth_J wrote:
>     Yes. This is necessary. If there are no incompatible files then this will essentially
be directory rename. If there are incompatible files and if the filename does not match Hive's
convention, then this has to go through file-by-file renaming to staging directory. On the
cloud (or on-prem), this will only affect users who already have managed or unmanaged hive
tables with files externally loaded (via copy or load data). In which case, renaming has to
be done to avoid data loss (more info in the jira). Also this patch disables concatenation
on external/unmanaged tables so it won't affect users with this patch. 
>     
>     Hive's duplicate file detection, does not delete files if it has _copy_ suffix (which
this patch introduces on conflicts).

MM tables don't have staging directories or any renames.
As for the non-MM case, isn't it possible to just leave these files alone and not merge them?


- Sergey


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62098/#review184607
-----------------------------------------------------------


On Sept. 5, 2017, 9:44 p.m., Prasanth_J wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62098/
> -----------------------------------------------------------
> 
> (Updated Sept. 5, 2017, 9:44 p.m.)
> 
> 
> Review request for hive and Sergey Shelukhin.
> 
> 
> Bugs: HIVE-17403
>     https://issues.apache.org/jira/browse/HIVE-17403
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-17403: Fail concatenation for unmanaged and transactional tables
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java b3ef9169c25c36c3d6c845f5000874fa78e51f82

>   ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java dfad6c192947c9ac80a1bbd86665f46aab128453

>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java aca99f2d833822a44f373ded4257af3589707baa

>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java feacdd8b605eb75166155fa2e7a1692ad4d52bd0

>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 230ca47e4a667b297cf2a6fef90dd18cf3d1a1c3

>   ql/src/test/queries/clientnegative/merge_negative_4.q PRE-CREATION 
>   ql/src/test/queries/clientnegative/merge_negative_5.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/orc_merge13.q PRE-CREATION 
>   ql/src/test/results/clientnegative/merge_negative_3.q.out 906336d4d3ea77ca3174a58fad05668d569f7492

>   ql/src/test/results/clientnegative/merge_negative_4.q.out PRE-CREATION 
>   ql/src/test/results/clientnegative/merge_negative_5.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/orc_merge13.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/62098/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Prasanth_J
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message