hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-15148) disallow loading data into bucketed tables (by default)
Date Fri, 11 Nov 2016 08:07:59 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15656460#comment-15656460
] 

Hive QA commented on HIVE-15148:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12838482/HIVE-15148.01.patch

{color:green}SUCCESS:{color} +1 due to 95 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10637 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_1] (batchId=59)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_2] (batchId=52)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_orig_table] (batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table] (batchId=51)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
(batchId=56)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[explainuser_2] (batchId=134)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=131)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_orig_table] (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid] (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats] (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=91)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[bucket_map_join_1] (batchId=120)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[bucket_map_join_2] (batchId=116)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2079/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2079/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2079/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12838482 - PreCommit-HIVE-Build

> disallow loading data into bucketed tables (by default)
> -------------------------------------------------------
>
>                 Key: HIVE-15148
>                 URL: https://issues.apache.org/jira/browse/HIVE-15148
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-15148.01.patch, HIVE-15148.patch
>
>
> A few q file tests still use the following, allowed, pattern:
> {noformat}
> CREATE TABLE bucket_small (key string, value string) partitioned by (ds string) CLUSTERED
BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> load data local inpath '../../data/files/smallsrcsortbucket1outof4.txt' INTO TABLE bucket_small
partition(ds='2008-04-08');
> load data local inpath '../../data/files/smallsrcsortbucket2outof4.txt' INTO TABLE bucket_small
partition(ds='2008-04-08');
> {noformat}
> This relies on the user to load the correct number of files with correctly hashed data
and the correct order of file names; if there's some discrepancy in any of the above, the
queries will fail or may produce incorrect results if some bucket-based optimizations kick
in.
> Additionally, even if the user does everything correctly, as far as I know some code
derives bucket number from file name, which won't work in this case (as opposed to getting
buckets based on the order of files, which will work here but won't work as per  HIVE-14970...
sigh).
> Hive enforces bucketing in other cases (the check cannot even be disabled these days),
so I suggest that we either prohibit the above outright, or at least add a safety config setting
that would disallow it by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message