hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7664) VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized execution and takes 25% CPU
Date Thu, 13 Nov 2014 04:36:34 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209256#comment-14209256
] 

Hive QA commented on HIVE-7664:
-------------------------------



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12662948/HIVE-7664.2.patch.txt

Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1764/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1764/console
Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1764/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and
output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-1764/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
++ svn status --no-ignore
+ rm -rf
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1639245.

At revision 1639245.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12662948 - PreCommit-HIVE-TRUNK-Build

> VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized execution and takes
25% CPU
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-7664
>                 URL: https://issues.apache.org/jira/browse/HIVE-7664
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.13.1
>            Reporter: Mostafa Mokhtar
>            Assignee: Gopal V
>         Attachments: HIVE-7664.1.patch.txt, HIVE-7664.2.patch.txt
>
>
> In a Group by heavy vectorized Reducer vertex 25% of CPU is spent in VectorizedBatchUtil.addRowToBatchFrom().
> Looked at the code of VectorizedBatchUtil.addRowToBatchFrom and it looks like it wasn't
optimized for Vectorized processing.
> addRowToBatchFrom is called for every row and for each row and every column in the batch
getPrimitiveCategory is called to figure the type of each column, column types are stored
in a HashMap, for VectorGroupByOperator columns types won't change between batches, so column
types shouldn't be looked up for every row.
> I recommend storing the column type in StructObjectInspector so that other components
can leverage this optimization.
> Also addRowToBatchFrom has a case statement for every row and every column used for type
casting I recommend encapsulating the type logic in templatized methods.   
> {code}
> Stack Trace	Sample Count	Percentage(%)
> VectorizedBatchUtil.addRowToBatchFrom	86	26.543
>    AbstractPrimitiveObjectInspector.getPrimitiveCategory()	34	10.494
>    LazyBinaryStructObjectInspector.getStructFieldData	25	7.716
>    StandardStructObjectInspector.getStructFieldData	4	1.235
> {code}
> The query used : 
> {code}
> select 
>     ss_sold_date_sk
> from
>     store_sales
> where
>     ss_sold_date between '1998-01-01' and '1998-06-01'
> group by ss_item_sk , ss_customer_sk , ss_sold_date_sk
> having sum(ss_list_price) > 50000000000000;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message