hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Koifman (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-17923) 'cluster by' should not be needed for a bucketed table
Date Mon, 30 Oct 2017 18:36:02 GMT

     [ https://issues.apache.org/jira/browse/HIVE-17923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eugene Koifman updated HIVE-17923:
----------------------------------
    Issue Type: Bug  (was: Sub-task)
        Parent:     (was: HIVE-17458)

> 'cluster by' should not be needed for a bucketed table
> ------------------------------------------------------
>
>                 Key: HIVE-17923
>                 URL: https://issues.apache.org/jira/browse/HIVE-17923
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>            Reporter: Eugene Koifman
>            Priority: Blocker
>
> given 
> {noformat}
> CREATE TABLE over10k_orc_bucketed(t tinyint,
>            si smallint,
>            i int,
>            b bigint,
>            f float,
>            d double,
>            bo boolean,
>            s string,
>            ts timestamp,
>            `dec` decimal(4,2),
>            bin binary) CLUSTERED BY(si) INTO 4 BUCKETS STORED AS ORC;
> {noformat}
> insert into over10k_orc_bucketed select * from over10k
> {noformat}
> produces 1 data file (bucket 0).  It should produce 4 based on input data.
> {noformat}
> insert into over10k_orc_bucketed select * from over10k cluster by si
> {noformat}
> does the right thing.
> acid_vectorization_original.q has the full script



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message