Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 540B210F90 for ; Wed, 11 Jun 2014 21:19:04 +0000 (UTC) Received: (qmail 20930 invoked by uid 500); 11 Jun 2014 21:19:03 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 20683 invoked by uid 500); 11 Jun 2014 21:19:01 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 20672 invoked by uid 500); 11 Jun 2014 21:19:01 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 20669 invoked by uid 99); 11 Jun 2014 21:19:01 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Jun 2014 21:19:01 +0000 Date: Wed, 11 Jun 2014 21:19:01 +0000 (UTC) From: "Hari Sankar Sivarama Subramaniyan (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-7188) sum(if()) returns wrong results with vectorization MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-7188: ---------------------------------------------------- Status: Open (was: Patch Available) > sum(if()) returns wrong results with vectorization > -------------------------------------------------- > > Key: HIVE-7188 > URL: https://issues.apache.org/jira/browse/HIVE-7188 > Project: Hive > Issue Type: Bug > Reporter: Hari Sankar Sivarama Subramaniyan > Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-7188.1.patch, HIVE-7188.2.patch, hike-vector-sum-bug.tgz > > > 1. The tgz file containing the setup is attached. > 2. Run the following query > select > sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning > from hike_error.ttr_day0; > returns 0 rows with vectorization turned on whereas it return 131 rows with vectorization turned off. > hive> source insert.sql > > ; > OK > Time taken: 0.359 seconds > OK > Time taken: 0.015 seconds > OK > Time taken: 0.069 seconds > OK > Time taken: 0.176 seconds > Loading data to table hike_error.ttr_day0 > Table hike_error.ttr_day0 stats: [numFiles=1, numRows=0, totalSize=3581, rawDataSize=0] > OK > Time taken: 0.33 seconds > hive> select > > sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning > > from hike_error.ttr_day0; > Query ID = hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed > Total jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks determined at compile time: 1 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer= > In order to limit the maximum number of reducers: > set hive.exec.reducers.max= > In order to set a constant number of reducers: > set mapred.reduce.tasks= > Execution log at: /var/folders/r0/9x0wltgx2nv4m4b18m71z1y40000gr/T//hsubramaniyan/hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed.log > Job running in-process (local Hadoop) > Hadoop job information for null: number of mappers: 0; number of reducers: 0 > 2014-06-06 13:47:02,043 null map = 0%, reduce = 100% > Ended Job = job_local773704964_0001 > Execution completed successfully > MapredLocal task succeeded > OK > 131 > Time taken: 5.325 seconds, Fetched: 1 row(s) > hive> set hive.vectorized.execution.enabled=true; > hive> select > > sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning > > from hike_error.ttr_day0; > Query ID = hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38 > Total jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks determined at compile time: 1 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer= > In order to limit the maximum number of reducers: > set hive.exec.reducers.max= > In order to set a constant number of reducers: > set mapred.reduce.tasks= > Execution log at: /var/folders/r0/9x0wltgx2nv4m4b18m71z1y40000gr/T//hsubramaniyan/hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38.log > Job running in-process (local Hadoop) > Hadoop job information for null: number of mappers: 0; number of reducers: 0 > 2014-06-06 13:47:18,604 null map = 0%, reduce = 100% > Ended Job = job_local701415676_0001 > Execution completed successfully > MapredLocal task succeeded > OK > 0 > Time taken: 5.52 seconds, Fetched: 1 row(s) > hive> explain select > > sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning > > from hike_error.ttr_day0; > OK > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Map Reduce > Map Operator Tree: > TableScan > alias: ttr_day0 > Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE Column stats: NONE > Select Operator > expressions: is_returning (type: boolean), is_free (type: boolean) > outputColumnNames: is_returning, is_free > Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE Column stats: NONE > Group By Operator > aggregations: sum(if(((is_returning = true) and (is_free = false)), 1, 0)) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE > Reduce Output Operator > sort order: > Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE > value expressions: _col0 (type: bigint) > Execution mode: vectorized > Reduce Operator Tree: > Group By Operator > aggregations: sum(VALUE._col0) > mode: mergepartial > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE > Select Operator > expressions: _col0 (type: bigint) > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > Time taken: 0.079 seconds, Fetched: 49 row(s) -- This message was sent by Atlassian JIRA (v6.2#6252)