Mailing-List: contact dev-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Wed, 11 Jun 2014 21:19:01 +0000 (UTC)
From: "Hari Sankar Sivarama Subramaniyan (JIRA)" <jira@apache.org>
To: hive-dev@hadoop.apache.org
Message-ID: <JIRA.12718935.1402088043906.112373.1402521541689@arcas>
In-Reply-To: <JIRA.12718935.1402088043906@arcas>
References: <JIRA.12718935.1402088043906@arcas>
Subject: [jira] [Updated] (HIVE-7188) sum(if()) returns wrong results with
 vectorization
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/HIVE-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hari Sankar Sivarama Subramaniyan updated HIVE-7188:
----------------------------------------------------

    Status: Open  (was: Patch Available)

> sum(if()) returns wrong results with vectorization
> --------------------------------------------------
>
>                 Key: HIVE-7188
>                 URL: https://issues.apache.org/jira/browse/HIVE-7188
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Hari Sankar Sivarama Subramaniyan
>            Assignee: Hari Sankar Sivarama Subramaniyan
>         Attachments: HIVE-7188.1.patch, HIVE-7188.2.patch, hike-vector-sum-bug.tgz
>
>
> 1. The tgz file containing the setup is attached.
> 2. Run the following query
> select
>     sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
>     from hike_error.ttr_day0;
> returns 0 rows with vectorization turned on whereas it return 131 rows with vectorization turned off.
> hive> source insert.sql
>     > ;
> OK
> Time taken: 0.359 seconds
> OK
> Time taken: 0.015 seconds
> OK
> Time taken: 0.069 seconds
> OK
> Time taken: 0.176 seconds
> Loading data to table hike_error.ttr_day0
> Table hike_error.ttr_day0 stats: [numFiles=1, numRows=0, totalSize=3581, rawDataSize=0]
> OK
> Time taken: 0.33 seconds
> hive> select
>     >     sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
>     >     from hike_error.ttr_day0;
> Query ID = hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>
> Execution log at: /var/folders/r0/9x0wltgx2nv4m4b18m71z1y40000gr/T//hsubramaniyan/hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed.log
> Job running in-process (local Hadoop)
> Hadoop job information for null: number of mappers: 0; number of reducers: 0
> 2014-06-06 13:47:02,043 null map = 0%,  reduce = 100%
> Ended Job = job_local773704964_0001
> Execution completed successfully
> MapredLocal task succeeded
> OK
> 131
> Time taken: 5.325 seconds, Fetched: 1 row(s)
> hive> set hive.vectorized.execution.enabled=true;                                                                
> hive> select
>     >     sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
>     >     from hike_error.ttr_day0;
> Query ID = hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>
> Execution log at: /var/folders/r0/9x0wltgx2nv4m4b18m71z1y40000gr/T//hsubramaniyan/hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38.log
> Job running in-process (local Hadoop)
> Hadoop job information for null: number of mappers: 0; number of reducers: 0
> 2014-06-06 13:47:18,604 null map = 0%,  reduce = 100%
> Ended Job = job_local701415676_0001
> Execution completed successfully
> MapredLocal task succeeded
> OK
> 0
> Time taken: 5.52 seconds, Fetched: 1 row(s)
> hive> explain select
>     >     sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning
>     >     from hike_error.ttr_day0;
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Map Reduce
>       Map Operator Tree:
>           TableScan
>             alias: ttr_day0
>             Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE Column stats: NONE
>             Select Operator
>               expressions: is_returning (type: boolean), is_free (type: boolean)
>               outputColumnNames: is_returning, is_free
>               Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE Column stats: NONE
>               Group By Operator
>                 aggregations: sum(if(((is_returning = true) and (is_free = false)), 1, 0))
>                 mode: hash
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
>                 Reduce Output Operator
>                   sort order: 
>                   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
>                   value expressions: _col0 (type: bigint)
>       Execution mode: vectorized
>       Reduce Operator Tree:
>         Group By Operator
>           aggregations: sum(VALUE._col0)
>           mode: mergepartial
>           outputColumnNames: _col0
>           Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
>           Select Operator
>             expressions: _col0 (type: bigint)
>             outputColumnNames: _col0
>             Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
>             File Output Operator
>               compressed: false
>               Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
>               table:
>                   input format: org.apache.hadoop.mapred.TextInputFormat
>                   output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> Time taken: 0.079 seconds, Fetched: 49 row(s)


--
This message was sent by Atlassian JIRA
(v6.2#6252)