drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Barclay (Drill/MapR) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-1572) accuracy issue with tpch query 01.q and 10.q
Date Fri, 23 Jan 2015 07:09:34 GMT

    [ https://issues.apache.org/jira/browse/DRILL-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288885#comment-14288885
] 

Daniel Barclay (Drill/MapR) commented on DRILL-1572:
----------------------------------------------------

Are we and our code remembering that addition is not associative for limited-precision numeric
representations (such as IEEE-754/Java/etc. floating-point types)?


Is Drill supposed to compute a sum by adding the values in the same order each time?  If so,
expecting to get the exactly same sum value might be valid.

Or is Drill required only to add all the values together in _some_ order/grouping (that is,
not necessarily in the same order--e.g., computing sub-sums for batches and assimilating those
batches' sums in different orders in different runs)?  If so, we can't expect to get the same
sum value each time.  (And recall that in general the difference can be large--not just the
equivalent of a couple of least-significant bits/digits.)


> accuracy issue with tpch query 01.q and 10.q
> --------------------------------------------
>
>                 Key: DRILL-1572
>                 URL: https://issues.apache.org/jira/browse/DRILL-1572
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 0.7.0
>            Reporter: Chun Chang
>            Assignee: Deneche A. Hakim
>            Priority: Critical
>             Fix For: 0.8.0
>
>
> code base:
> #Wed Oct 22 11:40:19 PDT 2014
> git.commit.id.abbrev=ae2790e
> The following two tpch queries failed verification due to accuracy in returned data.
> /home/work/drill-testing/testing/framework/resources/Advanced/Passing/tpch100/parquet/01.q
:
> {noformat}
> -- using 1395599672 as a seed to the RNG
> select
>   l_returnflag,
>   l_linestatus,
>   sum(l_quantity) as sum_qty,
>   sum(l_extendedprice) as sum_base_price,
>   sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
>   sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
>   avg(l_quantity) as avg_qty,
>   avg(l_extendedprice) as avg_price,
>   avg(l_discount) as avg_disc,
>   count(*) as count_order
> from
>   lineitem
> where
>   l_shipdate <= date '1998-12-01' - interval '120' day (3)
> group by
>   l_returnflag,
>   l_linestatus
> order by
>   l_returnflag,
>   l_linestatus
> {noformat}
>          Expected number of rows: 4
> Actual number of rows from Drill: 4
>          Number of matching rows: 0
>           Number of rows missing: 4
>        Number of rows unexpected: 4
> {noformat}
> These rows are not expected (first 10):
> 	A	F	3.775127758E9	5.660776097194428E12	5.377736398183944E12	5.59284742951595E12	25.499370423275426
38236.11698430475	0.05000224353079674	148047881	
> 	N	O	7.269911583E9	1.0901214476134316E13	1.0356163586785012E13	1.0770418891237377E13
25.499873337396807	38236.997134222445	0.04999763132401859	285095988	
> 	R	F	3.77572497E9	5.661603032745363E12	5.378513563915393E12	5.593662252666899E12	25.50006628406532
38236.697258453125	0.050001304339521574	148067261	
> 	N	F	9.8553062E7	1.4777109838597995E11	1.4038496596503476E11	1.4599979303277576E11	25.501556956882876
38237.19938880449	0.04998528433803116	3864590	
> These rows are missing (first 10):
> 	A	F	3.775127758E9	5.660776097197787E12	5.377736398184481E12	5.592847429514863E12	25.499370423275426
38236.11698432743	0.05000224347714149	148047881	 (1 time(s))
> 	N	O	7.269911583E9	1.0901214476133223E13	1.0356163586779275E13	1.0770418891231504E13
25.499873337396807	38236.99713421861	0.04999763124732218	285095988	 (1 time(s))
> 	R	F	3.77572497E9	5.661603032743618E12	5.378513563916123E12	5.593662252665821E12	25.50006628406532
38236.69725844134	0.05000130428587516	148067261	 (1 time(s))
> 	N	F	9.8553062E7	1.4777109838598825E11	1.4038496596503897E11	1.4599979303278268E11	25.501556956882876
38237.19938880664	0.04998528433773886	3864590	 (1 time(s))
> {noformat}
> Test_Failed: 2014/10/22 11:26:11.0011 - Verification failed.
> /home/work/drill-testing/testing/framework/resources/Advanced/Passing/tpch100/parquet/10.q
:
> {noformat}
> -- tpch10 using 1395599672 as a seed to the RNG
> select
>   c.c_custkey,
>   c.c_name,
>   sum(l.l_extendedprice * (1 - l.l_discount)) as revenue,
>   c.c_acctbal,
>   n.n_name,
>   c.c_address,
>   c.c_phone,
>   c.c_comment
> from
>   customer c,
>   orders o,
>   lineitem l,
>   nation n
> where
>   c.c_custkey = o.o_custkey
>   and l.l_orderkey = o.o_orderkey
>   and o.o_orderdate >= date '1994-03-01'
>   and o.o_orderdate < date '1994-03-01' + interval '3' month
>   and l.l_returnflag = 'R'
>   and c.c_nationkey = n.n_nationkey
> group by
>   c.c_custkey,
>   c.c_name,
>   c.c_acctbal,
>   c.c_phone,
>   n.n_name,
>   c.c_address,
>   c.c_comment
> order by
>   revenue desc
> limit 20
> {noformat}
>          Expected number of rows: 20
> Actual number of rows from Drill: 20
>          Number of matching rows: 17
>           Number of rows missing: 3
>        Number of rows unexpected: 3
> {noformat}
> These rows are not expected (first 10):
> 	6372220	Customer#006372220	793123.1516	2836.62	FRANCE	bfd3hpM99xDp6AFsGNOPP	16-143-244-4177
 regular theodolites are according to the unusual 	
> 	14211121	Customer#014211121	796135.1836	7443.03	MOROCCO	ks7nhxDqzdk72CfWM	25-755-902-4219
lyly final packages doubt furiously carefully bold theodolites. final 	
> 	246700	Customer#000246700	801786.5193999999	5244.71	CHINA	o6FXqCXJjKy3JdCAvuU3XJNRFcz35rAoc
28-466-828-8872	 even asymptotes cajole slyly with the furiously bold accounts. furiously
unusual platelets believe quickly final, 	
> These rows are missing (first 10):
> 	14211121	Customer#014211121	796135.1835999999	7443.03	MOROCCO	ks7nhxDqzdk72CfWM	25-755-902-4219
lyly final packages doubt furiously carefully bold theodolites. final	 (1 time(s))
> 	246700	Customer#000246700	801786.5194000001	5244.71	CHINA	o6FXqCXJjKy3JdCAvuU3XJNRFcz35rAoc
28-466-828-8872	 even asymptotes cajole slyly with the furiously bold accounts. furiously
unusual platelets believe quickly final,	 (1 time(s))
> 	6372220	Customer#006372220	793123.1516000001	2836.62	FRANCE	bfd3hpM99xDp6AFsGNOPP	16-143-244-4177
 regular theodolites are according to the unusual	 (1 time(s))
> {noformat}
> Test_Failed: 2014/10/22 11:23:10.0010 - Verification failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message