Return-Path: X-Original-To: apmail-drill-issues-archive@minotaur.apache.org Delivered-To: apmail-drill-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BD43310727 for ; Fri, 23 Jan 2015 19:26:34 +0000 (UTC) Received: (qmail 95637 invoked by uid 500); 23 Jan 2015 19:26:34 -0000 Delivered-To: apmail-drill-issues-archive@drill.apache.org Received: (qmail 95607 invoked by uid 500); 23 Jan 2015 19:26:34 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 95597 invoked by uid 99); 23 Jan 2015 19:26:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Jan 2015 19:26:34 +0000 Date: Fri, 23 Jan 2015 19:26:34 +0000 (UTC) From: "Deneche A. Hakim (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (DRILL-1572) accuracy issue with tpch query 01.q and 10.q MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/DRILL-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287907#comment-14287907 ] Deneche A. Hakim edited comment on DRILL-1572 at 1/23/15 7:26 PM: ------------------------------------------------------------------ Running this simplified query more than once gives slightly different results for the sum. As explained by Daniel, in a comment below, these slight differences are expected because the order of operations may change each time we run the query. {noformat} select sum(l_extendedprice) as sum_base_price, count(*) as `count` from dfs.`drillTestDirTpch100Parquet`.`lineitem` where l_shipdate <= date '1998-12-01' - interval '120' day (3) group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus {noformat} 1st run: {noformat} +----------------+------------+ | sum_base_price | count | +----------------+------------+ | 5.660776097194428E12 | 148047881 | | 1.4777109838597998E11 | 3864590 | | 1.0901214476134318E13 | 285095988 | | 5.661603032745359E12 | 148067261 | +----------------+------------+ {noformat} 2nd run: {noformat} +----------------+------------+ | sum_base_price | count | +----------------+------------+ | 5.660776097194429E12 | 148047881 | | 1.4777109838598E11 | 3864590 | | 1.0901214476134314E13 | 285095988 | | 5.66160303274536E12 | 148067261 | +----------------+------------+ {noformat} 3rd run: {noformat} +----------------+------------+ | sum_base_price | count | +----------------+------------+ | 5.660776097194428E12 | 148047881 | | 1.4777109838598004E11 | 3864590 | | 1.0901214476134316E13 | 285095988 | | 5.661603032745359E12 | 148067261 | +----------------+------------+ {noformat} 4th run: {noformat} +----------------+------------+ | sum_base_price | count | +----------------+------------+ | 5.660776097194428E12 | 148047881 | | 1.4777109838597995E11 | 3864590 | | 1.0901214476134316E13 | 285095988 | | 5.661603032745362E12 | 148067261 | +----------------+------------+ {noformat} was (Author: adeneche): Running this simplified query more than once gives slightly different results for the sum: {noformat} select sum(l_extendedprice) as sum_base_price, count(*) as `count` from dfs.`drillTestDirTpch100Parquet`.`lineitem` where l_shipdate <= date '1998-12-01' - interval '120' day (3) group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus {noformat} 1st run: {noformat} +----------------+------------+ | sum_base_price | count | +----------------+------------+ | 5.660776097194428E12 | 148047881 | | 1.4777109838597998E11 | 3864590 | | 1.0901214476134318E13 | 285095988 | | 5.661603032745359E12 | 148067261 | +----------------+------------+ {noformat} 2nd run: {noformat} +----------------+------------+ | sum_base_price | count | +----------------+------------+ | 5.660776097194429E12 | 148047881 | | 1.4777109838598E11 | 3864590 | | 1.0901214476134314E13 | 285095988 | | 5.66160303274536E12 | 148067261 | +----------------+------------+ {noformat} 3rd run: {noformat} +----------------+------------+ | sum_base_price | count | +----------------+------------+ | 5.660776097194428E12 | 148047881 | | 1.4777109838598004E11 | 3864590 | | 1.0901214476134316E13 | 285095988 | | 5.661603032745359E12 | 148067261 | +----------------+------------+ {noformat} 4th run: {noformat} +----------------+------------+ | sum_base_price | count | +----------------+------------+ | 5.660776097194428E12 | 148047881 | | 1.4777109838597995E11 | 3864590 | | 1.0901214476134316E13 | 285095988 | | 5.661603032745362E12 | 148067261 | +----------------+------------+ {noformat} > accuracy issue with tpch query 01.q and 10.q > -------------------------------------------- > > Key: DRILL-1572 > URL: https://issues.apache.org/jira/browse/DRILL-1572 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill > Affects Versions: 0.7.0 > Reporter: Chun Chang > Assignee: Deneche A. Hakim > Priority: Critical > Fix For: 0.8.0 > > > code base: > #Wed Oct 22 11:40:19 PDT 2014 > git.commit.id.abbrev=ae2790e > The following two tpch queries failed verification due to accuracy in returned data. > /home/work/drill-testing/testing/framework/resources/Advanced/Passing/tpch100/parquet/01.q : > {noformat} > -- using 1395599672 as a seed to the RNG > select > l_returnflag, > l_linestatus, > sum(l_quantity) as sum_qty, > sum(l_extendedprice) as sum_base_price, > sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, > sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, > avg(l_quantity) as avg_qty, > avg(l_extendedprice) as avg_price, > avg(l_discount) as avg_disc, > count(*) as count_order > from > lineitem > where > l_shipdate <= date '1998-12-01' - interval '120' day (3) > group by > l_returnflag, > l_linestatus > order by > l_returnflag, > l_linestatus > {noformat} > Expected number of rows: 4 > Actual number of rows from Drill: 4 > Number of matching rows: 0 > Number of rows missing: 4 > Number of rows unexpected: 4 > {noformat} > These rows are not expected (first 10): > A F 3.775127758E9 5.660776097194428E12 5.377736398183944E12 5.59284742951595E12 25.499370423275426 38236.11698430475 0.05000224353079674 148047881 > N O 7.269911583E9 1.0901214476134316E13 1.0356163586785012E13 1.0770418891237377E13 25.499873337396807 38236.997134222445 0.04999763132401859 285095988 > R F 3.77572497E9 5.661603032745363E12 5.378513563915393E12 5.593662252666899E12 25.50006628406532 38236.697258453125 0.050001304339521574 148067261 > N F 9.8553062E7 1.4777109838597995E11 1.4038496596503476E11 1.4599979303277576E11 25.501556956882876 38237.19938880449 0.04998528433803116 3864590 > These rows are missing (first 10): > A F 3.775127758E9 5.660776097197787E12 5.377736398184481E12 5.592847429514863E12 25.499370423275426 38236.11698432743 0.05000224347714149 148047881 (1 time(s)) > N O 7.269911583E9 1.0901214476133223E13 1.0356163586779275E13 1.0770418891231504E13 25.499873337396807 38236.99713421861 0.04999763124732218 285095988 (1 time(s)) > R F 3.77572497E9 5.661603032743618E12 5.378513563916123E12 5.593662252665821E12 25.50006628406532 38236.69725844134 0.05000130428587516 148067261 (1 time(s)) > N F 9.8553062E7 1.4777109838598825E11 1.4038496596503897E11 1.4599979303278268E11 25.501556956882876 38237.19938880664 0.04998528433773886 3864590 (1 time(s)) > {noformat} > Test_Failed: 2014/10/22 11:26:11.0011 - Verification failed. > /home/work/drill-testing/testing/framework/resources/Advanced/Passing/tpch100/parquet/10.q : > {noformat} > -- tpch10 using 1395599672 as a seed to the RNG > select > c.c_custkey, > c.c_name, > sum(l.l_extendedprice * (1 - l.l_discount)) as revenue, > c.c_acctbal, > n.n_name, > c.c_address, > c.c_phone, > c.c_comment > from > customer c, > orders o, > lineitem l, > nation n > where > c.c_custkey = o.o_custkey > and l.l_orderkey = o.o_orderkey > and o.o_orderdate >= date '1994-03-01' > and o.o_orderdate < date '1994-03-01' + interval '3' month > and l.l_returnflag = 'R' > and c.c_nationkey = n.n_nationkey > group by > c.c_custkey, > c.c_name, > c.c_acctbal, > c.c_phone, > n.n_name, > c.c_address, > c.c_comment > order by > revenue desc > limit 20 > {noformat} > Expected number of rows: 20 > Actual number of rows from Drill: 20 > Number of matching rows: 17 > Number of rows missing: 3 > Number of rows unexpected: 3 > {noformat} > These rows are not expected (first 10): > 6372220 Customer#006372220 793123.1516 2836.62 FRANCE bfd3hpM99xDp6AFsGNOPP 16-143-244-4177 regular theodolites are according to the unusual > 14211121 Customer#014211121 796135.1836 7443.03 MOROCCO ks7nhxDqzdk72CfWM 25-755-902-4219 lyly final packages doubt furiously carefully bold theodolites. final > 246700 Customer#000246700 801786.5193999999 5244.71 CHINA o6FXqCXJjKy3JdCAvuU3XJNRFcz35rAoc 28-466-828-8872 even asymptotes cajole slyly with the furiously bold accounts. furiously unusual platelets believe quickly final, > These rows are missing (first 10): > 14211121 Customer#014211121 796135.1835999999 7443.03 MOROCCO ks7nhxDqzdk72CfWM 25-755-902-4219 lyly final packages doubt furiously carefully bold theodolites. final (1 time(s)) > 246700 Customer#000246700 801786.5194000001 5244.71 CHINA o6FXqCXJjKy3JdCAvuU3XJNRFcz35rAoc 28-466-828-8872 even asymptotes cajole slyly with the furiously bold accounts. furiously unusual platelets believe quickly final, (1 time(s)) > 6372220 Customer#006372220 793123.1516000001 2836.62 FRANCE bfd3hpM99xDp6AFsGNOPP 16-143-244-4177 regular theodolites are according to the unusual (1 time(s)) > {noformat} > Test_Failed: 2014/10/22 11:23:10.0010 - Verification failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)