drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kristine Hahn (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (DRILL-2431) Document behavior of floating point types
Date Fri, 13 Mar 2015 01:19:38 GMT

     [ https://issues.apache.org/jira/browse/DRILL-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kristine Hahn resolved DRILL-2431.
----------------------------------
    Resolution: Fixed

Done, added the following section to the docs:
{quote}
**Guidelines for Using Float and Double**
The data types float and double yield approximate results. These are variable-precision numeric
types. Drill does not cast/convert all values precisely to the internal format, but instead
stores approximations. Slight differences can occur in the value stored and retrieved. The
following guidelines are recommended:

* For conversions involving monetary calculations, for example, that require precise results
use the decimal type instead of float or double.
* For complex calculations or mission-critical applications, especially those involving infinity
and underflow situations, carefully consider the limitations of type casting that involves
float or double.
* Equality comparisons between floating-point values can produce unexpected results.

Values of float and double that exceed or fall short of the specified range cause an error.
Rounding can occur if the precision of an input number is too high. Numbers approaching zero
that Drill cannot distinguish from zero cause an underflow error.
{quote}

> Document behavior of floating point types
> -----------------------------------------
>
>                 Key: DRILL-2431
>                 URL: https://issues.apache.org/jira/browse/DRILL-2431
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Documentation
>    Affects Versions: 0.8.0
>            Reporter: Victoria Markman
>            Assignee: Kristine Hahn
>
>    Joining on columns of float and double data type produces confusing result. Drill
returns the same result as postgres. Part of me feels that we should not follow postgres blindly
in this case, removing implicit cast between float and double would be better choice.
>    At a minimum we should have a section in our documentation that discusses floating
point types and this is a good example on how things can go wrong if user does not understand
the behavior.
> Example of such a discussion in Postgres docs: http://www.postgresql.org/docs/9.1/static/datatype-numeric.html
> t1.csv
> {code}
> 997322.0399,997322.0399
> 982209.1438,982209.1438
> 997322,997322
> 982209,982209
> 963548,963548
> 959310,959310
> {code}
> t2.csv
> {code}
> 997322.0399,997322.0399
> 982209.1438,982209.1438
> 997322,997322
> 982209,982209
> 963548,963548
> 959310,959310
> {code}
> {code}
> create table t1(c_float, c_double) as
> select
>         case when columns[0] = '' then cast(null as float) else cast(columns[0] as float)
end,
>         case when columns[1] = '' then cast(null as double) else cast(columns[1] as double)
end
> from `t1.csv`;
> create table t2(c_float, c_double) as
> select
>         case when columns[0] = '' then cast(null as float) else cast(columns[0] as float)
end,
>         case when columns[1] = '' then cast(null as double) else cast(columns[1] as double)
end
> from `t2.csv`;
> 0: jdbc:drill:schema=dfs> select * from t1;
> +------------+------------+
> |  c_float   |  c_double  |
> +------------+------------+
> | 997322.06  | 997322.0399 |
> | 982209.1   | 982209.1438 |
> | 997322.0   | 997322.0   |
> | 982209.0   | 982209.0   |
> | 963548.0   | 963548.0   |
> | 959310.0   | 959310.0   |
> +------------+------------+
> 6 rows selected (0.05 seconds)
> 0: jdbc:drill:schema=dfs> select * from t2;
> +------------+------------+
> |  c_float   |  c_double  |
> +------------+------------+
> | 997322.06  | 997322.0399 |
> | 982209.1   | 982209.1438 |
> | 997322.0   | 997322.0   |
> | 982209.0   | 982209.0   |
> | 963548.0   | 963548.0   |
> | 959310.0   | 959310.0   |
> +------------+------------+
> 6 rows selected (0.044 seconds)
> {code}
> Implicit cast: looks incorrect, but in fact we can't expect this to work.
> {code}
> 0: jdbc:drill:schema=dfs> select * from t1, t2 where t1.c_float = t2.c_double;
> +------------+------------+------------+------------+
> |  c_float   |  c_double  |  c_float0  | c_double0  |
> +------------+------------+------------+------------+
> | 959310.0   | 959310.0   | 959310.0   | 959310.0   |
> | 963548.0   | 963548.0   | 963548.0   | 963548.0   |
> | 982209.0   | 982209.0   | 982209.0   | 982209.0   |
> | 997322.0   | 997322.0   | 997322.0   | 997322.0   |
> +------------+------------+------------+------------+
> 4 rows selected (0.127 seconds)
> {code}
> Explicit cast: same
> {code}
> 0: jdbc:drill:schema=dfs> select * from t1, t2 where cast(t1.c_float as double) =
t2.c_double;
> +------------+------------+------------+------------+
> |  c_float   |  c_double  |  c_float0  | c_double0  |
> +------------+------------+------------+------------+
> | 959310.0   | 959310.0   | 959310.0   | 959310.0   |
> | 963548.0   | 963548.0   | 963548.0   | 963548.0   |
> | 982209.0   | 982209.0   | 982209.0   | 982209.0   |
> | 997322.0   | 997322.0   | 997322.0   | 997322.0   |
> +------------+------------+------------+------------+
> 4 rows selected (0.136 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message