drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Venki Korukanti (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (DRILL-3271) Hive : Tpch 01.q fails with a verification issue for SF100 dataset
Date Wed, 08 Jul 2015 17:39:04 GMT

     [ https://issues.apache.org/jira/browse/DRILL-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Venki Korukanti resolved DRILL-3271.
------------------------------------
    Resolution: Invalid

Just had a discussion with [~adeneche]. Floating point differences between runs are due to
truncation in arithmetic operations and the order of data received at aggregator. The differences
here still seems to be in acceptable range. We need to update the margin error constant in
test framework.

> Hive : Tpch 01.q fails with a verification issue for SF100 dataset
> ------------------------------------------------------------------
>
>                 Key: DRILL-3271
>                 URL: https://issues.apache.org/jira/browse/DRILL-3271
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Hive
>            Reporter: Rahul Challapalli
>            Assignee: Venki Korukanti
>             Fix For: 1.2.0
>
>         Attachments: tpch100_hive.ddl
>
>
> git.commit.id.abbrev=5f26b8b
> Query :
> {code}
> select
>   l_returnflag,
>   l_linestatus,
>   sum(l_quantity) as sum_qty,
>   sum(l_extendedprice) as sum_base_price,
>   sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
>   sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
>   avg(l_quantity) as avg_qty,
>   avg(l_extendedprice) as avg_price,
>   avg(l_discount) as avg_disc,
>   count(*) as count_order
> from
>   lineitem
> where
>   l_shipdate <= date '1998-12-01' - interval '120' day (3)
> group by
>   l_returnflag,
>   l_linestatus
> order by
>   l_returnflag,
>   l_linestatus;
> {code}
> The 4th column appears to have some differences. Not sure if it is within acceptable
range
> Expected :
> {code}
> A       F       3.775127758E9   5.660776097194428E12    5.377736398183942E12    5.592847429515948E12
   25.499370423275426      38236.11698430475       0.05000224353079674     148047881
> N       O       7.269911583E9   1.0901214476134316E13   1.0356163586785008E13   1.077041889123738E13
   25.499873337396807      38236.997134222445      0.04999763132401859     285095988
> R       F       3.77572497E9    5.661603032745362E12    5.378513563915394E12    5.593662252666902E12
   25.50006628406532       38236.69725845312       0.05000130433952159     148067261
> N       F       9.8553062E7     1.4777109838597995E11   1.403849659650348E11    1.459997930327757E11
   25.501556956882876      38237.19938880449       0.04998528433803118     3864590
> {code}
> Actual : 
> {code}
> A       F       3.775127758E9   5.660776097194352E12    5.37773639818398E12     5.592847429515874E12
   25.499370423275426      38236.11698430423       0.0500022435305286      148047881
> N       O       7.269911583E9   1.0901214476134352E13   1.0356163586784926E13   1.0770418891237576E13
  25.499873337396807      38236.99713422257       0.04999763132535226     285095988
> R       F       3.77572497E9    5.661603032745394E12    5.378513563915313E12    5.593662252666848E12
   25.50006628406532       38236.69725845333       0.05000130433925318     148067261
> N       F       9.8553062E7     1.4777109838598022E11   1.4038496596503506E11   1.45999793032776E11
    25.501556956882876      38237.19938880456       0.049985284338093884    3864590
> {code}
> The data is 100 GB, so I couldn't attach it here.
> I attached the hive ddl. Let me know if you need anything else



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message