hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-16972) FetchOperator: filter out inputSplits which length is zero
Date Wed, 28 Jun 2017 11:53:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066381#comment-16066381
] 

Hive QA commented on HIVE-16972:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12874800/HIVE-16972.2.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5805/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5805/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5805/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and
output '+ date '+%Y-%m-%d %T.%3N'
2017-06-28 11:52:09.803
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-5805/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-06-28 11:52:09.805
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at fafa953 HIVE-16969: Improvement performance of MapOperator for Parquet (Colin
Ma, reviewed by Ferdinand Xu)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at fafa953 HIVE-16969: Improvement performance of MapOperator for Parquet (Colin
Ma, reviewed by Ferdinand Xu)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-06-28 11:52:12.177
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch
error: patch failed: ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java:24
error: ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java: patch does not apply
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12874800 - PreCommit-HIVE-Build

> FetchOperator: filter out inputSplits which length is zero
> ----------------------------------------------------------
>
>                 Key: HIVE-16972
>                 URL: https://issues.apache.org/jira/browse/HIVE-16972
>             Project: Hive
>          Issue Type: Improvement
>          Components: Physical Optimizer
>    Affects Versions: 2.1.0, 2.1.1
>            Reporter: Chaozhong Yang
>            Assignee: Chaozhong Yang
>             Fix For: 2.1.2
>
>         Attachments: HIVE-16972.2.patch, HIVE-16972.patch
>
>
> * Background
>    We can describe the basic work flow of  common HQL query as follows:
>   1. compile and execute
>   2. fetch results
>   In many cases, we don't need to  worry about the issues fetching results from HDFS(iff
there are mapreduce jobs generated in planning step). However, the number of results files
on HDFS and data distribution will affect the final status of HQL query, especially for HiveServer2.
We have some map-only queries, e.g: 
> {code:sql}
> select * from myTable where date > '20170101' and date <= '20170301' and id = 88;
> {code}
>     This query will generate more than 20,000 files(look at screenshot image uploaded)
on HDFS and most of those files are empty. Of course, they are very sparse. If we send TFetchResultsRequest
from HiveServer2 client with  some parameters(timeout:90s, maxRows:1024) , FetchOperator can
not fetch 1024 rows in 90 seconds and our HiveServer2 client will mark this TFetchResultsRequest
as timed out failure. Why? In fact, It's expensive to fetch results from empty file. In our
HDFS cluster( 5000+ DataNodes) , reading data from an empty file will cost almost 100 ms (100ms
* 1000 ==> 100s > 90s timeout). Obviously, we can filter out those empty files or splits
to speed up the process of FetchResults. 
>   



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message