hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Laljo John Pullokkaran (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-6867) Bucketized Table feature fails in some cases
Date Tue, 08 Apr 2014 21:31:17 GMT
Laljo John Pullokkaran created HIVE-6867:
--------------------------------------------

             Summary: Bucketized Table feature fails in some cases
                 Key: HIVE-6867
                 URL: https://issues.apache.org/jira/browse/HIVE-6867
             Project: Hive
          Issue Type: Bug
          Components: HiveServer2
    Affects Versions: 0.12.0
            Reporter: Laljo John Pullokkaran
            Assignee: Laljo John Pullokkaran


Bucketized Table feature fails in some cases. if src & destination is bucketed on same
key, and if actual data in the src is not bucketed (because data got loaded using LOAD DATA
LOCAL INPATH ) then the data won't be bucketed while writing to destination.
Example
----------------------------------------------------------------------
CREATE TABLE P1(key STRING, val STRING)
CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH '/Users/jpullokkaran/apache-hive1/data/files/P1.txt' INTO TABLE P1;
– perform an insert to make sure there are 2 files
INSERT OVERWRITE TABLE P1 select key, val from P1;
--------------------------------------------------
This is not a regression. This has never worked.
This got only discovered due to Hadoop2 changes.
In Hadoop1, in local mode, number of reducers will always be 1, regardless of what is requested
by app. Hadoop2 now honors the number of reducer setting in local mode (by spawning threads).
Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message