hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pengcheng Xiong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-6867) Bucketized Table feature fails in some cases
Date Fri, 29 May 2015 21:57:17 GMT

    [ https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565495#comment-14565495
] 

Pengcheng Xiong commented on HIVE-6867:
---------------------------------------

[~xuefuz], Yes, the problem still remains. If you read my comment on RB this morning, you
will find that "And after we discussed with Hive JDBC guy, we found that current infrastructure
does not support warning msg to be passed through JDBC. We acknowledge that this is something
that we need to improve in the future."

> Bucketized Table feature fails in some cases
> --------------------------------------------
>
>                 Key: HIVE-6867
>                 URL: https://issues.apache.org/jira/browse/HIVE-6867
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 0.12.0
>            Reporter: Laljo John Pullokkaran
>            Assignee: Pengcheng Xiong
>         Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch, HIVE-6867.03.patch, HIVE-6867.04.patch,
HIVE-6867.05.patch
>
>
> Bucketized Table feature fails in some cases. if src & destination is bucketed on
same key, and if actual data in the src is not bucketed (because data got loaded using LOAD
DATA LOCAL INPATH ) then the data won't be bucketed while writing to destination.
> Example
> ----------------------------------------------------------------------
> CREATE TABLE P1(key STRING, val STRING)
> CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE P1;
> – perform an insert to make sure there are 2 files
> INSERT OVERWRITE TABLE P1 select key, val from P1;
> --------------------------------------------------
> This is not a regression. This has never worked.
> This got only discovered due to Hadoop2 changes.
> In Hadoop1, in local mode, number of reducers will always be 1, regardless of what is
requested by app. Hadoop2 now honors the number of reducer setting in local mode (by spawning
threads).
> Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message