hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-19889) Wrong results due to PPD of non deterministic functions with CBO
Date Mon, 18 Jun 2018 04:58:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-19889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515350#comment-16515350
] 

Hive QA commented on HIVE-19889:
--------------------------------

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} |
{color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 57s{color}
| {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  8s{color} |
{color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 42s{color}
| {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  1s{color} | {color:blue}
ql in master has 2281 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 56s{color} |
{color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 24s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  4s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  4s{color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 42s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  8s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 57s{color} |
{color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 12s{color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 23m 39s{color} | {color:black}
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03)
x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11878/dev-support/hive-personality.sh
|
| git revision | master / f83d765 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| modules | C: ql U: ql |
| Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-11878/yetus.txt |
| Powered by | Apache Yetus    http://yetus.apache.org |


This message was automatically generated.



> Wrong results due to PPD of non deterministic functions with CBO
> ----------------------------------------------------------------
>
>                 Key: HIVE-19889
>                 URL: https://issues.apache.org/jira/browse/HIVE-19889
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Janaki Lahorani
>            Assignee: Janaki Lahorani
>            Priority: Major
>             Fix For: 4.0.0
>
>         Attachments: HIVE-19889.1.patch, HIVE-19889.2.patch
>
>
> The following query can give wrong results when CBO is on:
> {code}
> select * from (
> select part1,randum123
> from (SELECT *, cast(rand() as double) AS randum123 FROM testA where part1='CA' and part2
= 'ABC') a
> where randum123 <= 0.5) s where s.randum123 > 0.25 limit 20;
> The plan of the query is as follows:
> STAGE PLANS:
>   Stage: Stage-1
>     Map Reduce
>       Map Operator Tree:
>           TableScan
>             alias: testa
>             Statistics: Num rows: 2 Data size: 4580 Basic stats: COMPLETE Column stats:
NONE
>             Filter Operator
>               predicate: ((rand() <= 0.5D) and (rand() > 0.25D)) (type: boolean)
>               Statistics: Num rows: 1 Data size: 2290 Basic stats: COMPLETE Column stats:
NONE
>               Select Operator
>                 expressions: 'CA' (type: string), rand() (type: double)
>                 outputColumnNames: _col0, _col1
>                 Statistics: Num rows: 1 Data size: 2290 Basic stats: COMPLETE Column
stats: NONE
>                 Limit
>                   Number of rows: 20
>                   Statistics: Num rows: 1 Data size: 2290 Basic stats: COMPLETE Column
stats: NONE
>                   File Output Operator
>                     compressed: false
>                     Statistics: Num rows: 1 Data size: 2290 Basic stats: COMPLETE Column
stats: NONE
>                     table:
>                         input format: org.apache.hadoop.mapred.SequenceFileInputFormat
>                         output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                         serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
>     Fetch Operator
>       limit: 20
>       Processor Tree:
>         ListSink
> {code}
> The relevant part in the plan is the filter:
> {code}
>             Filter Operator
>               predicate: ((rand() <= 0.5D) and (rand() > 0.25D)) (type: boolean)
> {code}
> The predicates s.randum123 > 0.25 and s.randum123 > 0.25 were pushed down.  And
randum123 was resolved to rand().  This is bad because it will result in invocation of rand()
two times and rand() UDF is non-deterministic.  Both the rand calls can generate values that
can satisfy the predicates independently, but not together, whereas the original intention
of the query is to give results when rand falls between 0.25 and 0.5.
> A sample result:
> {code}
> CA	0.9191984370369802
> CA	0.397933021566812
> {code}
> where the condition was not satisfied.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message