hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-18049) Enable Hive on Tez to provide globally sorted clustered table
Date Mon, 13 Nov 2017 06:54:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-18049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249162#comment-16249162
] 

Hive QA commented on HIVE-18049:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12897286/HIVE-18049.2.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7784/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7784/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7784/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and
output '+ date '+%Y-%m-%d %T.%3N'
2017-11-13 06:53:32.575
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-7784/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-11-13 06:53:32.578
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   67888cf..25a6f4c  master     -> origin/master
+ git reset --hard HEAD
HEAD is now at 67888cf HIVE-17995 Run checkstyle on standalone-metastore module with proper
configuration (Adam Szita via Alan Gates)
+ git clean -f -d
Removing ${project.basedir}/
Removing ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/BaseVectorizedColumnReader.java
Removing ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedListColumnReader.java
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/master
HEAD is now at 25a6f4c HIVE-17615: Task.executeTask has to be thread safe for parallel execution
(Anishek Agarwal reviewed by Daniel Dai)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-11-13 06:53:37.768
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch
fatal: git diff header lacks filename information when removing 0 leading pathname components
(line 41)
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12897286 - PreCommit-HIVE-Build

> Enable Hive on Tez to provide globally sorted clustered table
> -------------------------------------------------------------
>
>                 Key: HIVE-18049
>                 URL: https://issues.apache.org/jira/browse/HIVE-18049
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive, Tez
>            Reporter: LingXiao Lan
>             Fix For: 2.1.1
>
>         Attachments: HIVE-18049.1.patch, HIVE-18049.2.patch, HIVE-18049.3.patch
>
>
> {code:sql}
> CREATE TABLE `test`(
>    `time` int,
>    `userid` bigint)
>  CLUSTERED BY (
>    userid)
>  SORTED BY (
>    userid ASC)
>  INTO 4 BUCKETS
>  ;
> {code}
> When insert data into this table, the data will be sorted into 4 buckets automatically.
But because hive uses hash partitioner by default, the data is only sorted in each bucket
and isn't sorted among different buckets. Sometimes we need the data to be globally sorted,
to optimizing indexing, for example.
> If we can sample the table first and use TotalOrderPartitioner, this work could be done.
The difficulty is how do we automatically decide when to use TotalOrderPartitioner and when
not, because a insertion query can be complex, which results in a complex DAG in Tez.
> I have implemented a temporary version. It uses a customer partitioner which combines
hash partitioner and totalorder partitioner. A physical optimizer is added to hive to decide
to choose which partitioner. But in order to reduce the work load, this version should affect
tez source code, which is not necessary in fact.
> I'm wondering if we can implement a more common version which addresses this issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message