drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kristine Hahn (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (DRILL-2431) Document behavior of floating point types
Date Wed, 18 Mar 2015 21:32:38 GMT

    [ https://issues.apache.org/jira/browse/DRILL-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360730#comment-14360730
] 

Kristine Hahn edited comment on DRILL-2431 at 3/18/15 9:32 PM:
---------------------------------------------------------------

Tks, [~dbarclay@maprtech.com], changed as follows:
{quote}
Values of FLOAT and DOUBLE that are less than the lowest value in the range (more negative)
cause an error. Rounding can occur if the precision of an input number is too high. -deleted-Numbers
approaching zero that Drill cannot distinguish from zero cause an underflow error.-deleted-
{quote}

Review copy is now posted on the Drill wiki: http://drill.apache.org/docs/handling-different-data-types#guidelines-for-using-float-and-double


was (Author: krishahn):
Tks, [~dbarclay@maprtech.com], changed as follows:
{quote}
Values of FLOAT and DOUBLE that are less than the lowest value in the range (more negative)
cause an error. Rounding can occur if the precision of an input number is too high. -deleted-Numbers
approaching zero that Drill cannot distinguish from zero cause an underflow error.-deleted-
{quote}

> Document behavior of floating point types
> -----------------------------------------
>
>                 Key: DRILL-2431
>                 URL: https://issues.apache.org/jira/browse/DRILL-2431
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Documentation
>    Affects Versions: 0.8.0
>            Reporter: Victoria Markman
>            Assignee: Kristine Hahn
>
>    Joining on columns of float and double data type produces confusing result. Drill
returns the same result as postgres. Part of me feels that we should not follow postgres blindly
in this case, removing implicit cast between float and double would be better choice.
>    At a minimum we should have a section in our documentation that discusses floating
point types and this is a good example on how things can go wrong if user does not understand
the behavior.
> Example of such a discussion in Postgres docs: http://www.postgresql.org/docs/9.1/static/datatype-numeric.html
> t1.csv
> {code}
> 997322.0399,997322.0399
> 982209.1438,982209.1438
> 997322,997322
> 982209,982209
> 963548,963548
> 959310,959310
> {code}
> t2.csv
> {code}
> 997322.0399,997322.0399
> 982209.1438,982209.1438
> 997322,997322
> 982209,982209
> 963548,963548
> 959310,959310
> {code}
> {code}
> create table t1(c_float, c_double) as
> select
>         case when columns[0] = '' then cast(null as float) else cast(columns[0] as float)
end,
>         case when columns[1] = '' then cast(null as double) else cast(columns[1] as double)
end
> from `t1.csv`;
> create table t2(c_float, c_double) as
> select
>         case when columns[0] = '' then cast(null as float) else cast(columns[0] as float)
end,
>         case when columns[1] = '' then cast(null as double) else cast(columns[1] as double)
end
> from `t2.csv`;
> 0: jdbc:drill:schema=dfs> select * from t1;
> +------------+------------+
> |  c_float   |  c_double  |
> +------------+------------+
> | 997322.06  | 997322.0399 |
> | 982209.1   | 982209.1438 |
> | 997322.0   | 997322.0   |
> | 982209.0   | 982209.0   |
> | 963548.0   | 963548.0   |
> | 959310.0   | 959310.0   |
> +------------+------------+
> 6 rows selected (0.05 seconds)
> 0: jdbc:drill:schema=dfs> select * from t2;
> +------------+------------+
> |  c_float   |  c_double  |
> +------------+------------+
> | 997322.06  | 997322.0399 |
> | 982209.1   | 982209.1438 |
> | 997322.0   | 997322.0   |
> | 982209.0   | 982209.0   |
> | 963548.0   | 963548.0   |
> | 959310.0   | 959310.0   |
> +------------+------------+
> 6 rows selected (0.044 seconds)
> {code}
> Implicit cast: looks incorrect, but in fact we can't expect this to work.
> {code}
> 0: jdbc:drill:schema=dfs> select * from t1, t2 where t1.c_float = t2.c_double;
> +------------+------------+------------+------------+
> |  c_float   |  c_double  |  c_float0  | c_double0  |
> +------------+------------+------------+------------+
> | 959310.0   | 959310.0   | 959310.0   | 959310.0   |
> | 963548.0   | 963548.0   | 963548.0   | 963548.0   |
> | 982209.0   | 982209.0   | 982209.0   | 982209.0   |
> | 997322.0   | 997322.0   | 997322.0   | 997322.0   |
> +------------+------------+------------+------------+
> 4 rows selected (0.127 seconds)
> {code}
> Explicit cast: same
> {code}
> 0: jdbc:drill:schema=dfs> select * from t1, t2 where cast(t1.c_float as double) =
t2.c_double;
> +------------+------------+------------+------------+
> |  c_float   |  c_double  |  c_float0  | c_double0  |
> +------------+------------+------------+------------+
> | 959310.0   | 959310.0   | 959310.0   | 959310.0   |
> | 963548.0   | 963548.0   | 963548.0   | 963548.0   |
> | 982209.0   | 982209.0   | 982209.0   | 982209.0   |
> | 997322.0   | 997322.0   | 997322.0   | 997322.0   |
> +------------+------------+------------+------------+
> 4 rows selected (0.136 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message