hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Laljo John Pullokkaran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-6867) Bucketized Table feature fails in some cases
Date Tue, 08 Apr 2014 21:37:17 GMT

    [ https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963472#comment-13963472
] 

Laljo John Pullokkaran commented on HIVE-6867:
----------------------------------------------

BucketingSortingReduceSinkOptimizer removes RS op if src & destination is bucketed on
same key.

> Bucketized Table feature fails in some cases
> --------------------------------------------
>
>                 Key: HIVE-6867
>                 URL: https://issues.apache.org/jira/browse/HIVE-6867
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 0.12.0
>            Reporter: Laljo John Pullokkaran
>            Assignee: Laljo John Pullokkaran
>
> Bucketized Table feature fails in some cases. if src & destination is bucketed on
same key, and if actual data in the src is not bucketed (because data got loaded using LOAD
DATA LOCAL INPATH ) then the data won't be bucketed while writing to destination.
> Example
> ----------------------------------------------------------------------
> CREATE TABLE P1(key STRING, val STRING)
> CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '/Users/jpullokkaran/apache-hive1/data/files/P1.txt' INTO TABLE
P1;
> – perform an insert to make sure there are 2 files
> INSERT OVERWRITE TABLE P1 select key, val from P1;
> --------------------------------------------------
> This is not a regression. This has never worked.
> This got only discovered due to Hadoop2 changes.
> In Hadoop1, in local mode, number of reducers will always be 1, regardless of what is
requested by app. Hadoop2 now honors the number of reducer setting in local mode (by spawning
threads).
> Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message