impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Apple <jbap...@cloudera.com>
Subject Re: Querying column of double data type from (Kudu-based) table: artifact digits
Date Mon, 11 Sep 2017 15:53:14 GMT
68.01 is not exactly representable in IEEE double-precision floating point.
In Python:

>>> 68.01 * 100 - 6801
9.094947017729282e-13

One workaround is to use integers and store 6801 instead of 68.01. In
Impala, in non-Kudu tables, you can use DECIMAL.


On Mon, Sep 11, 2017 at 12:37 AM, Aleksei Maželis <olmazh@gmail.com> wrote:

> Hi,
>
> While querying, via Impala, a double-type data column from Kudu table, I
> have noticed that the original value is sometimes miss-represented with an
> artifact digit. Say, whereas "68.01" is stored in a column, querying it
> returns "68.01000000000001". What would be the proper way of handling (or
> going around) that?
>
> This can be reproduced e.g. as:
>
> > create TABLE test
> (
>   col1 BIGINT PRIMARY KEY,
>   col2 STRING,
>   col3 double
> ) PARTITION BY HASH(col1) PARTITIONS 2
> STORED AS KUDU TBLPROPERTIES ('kudu.master_addresses'='kudu-docker:7051',
> 'kudu.num_tablet_replicas' = '1', 'kudu.table_name'='test')
>
> > describe test;
> Query: describe test
> +------+--------+---------+-------------+----------+--------
> -------+---------------+---------------------+------------+
> | name | type   | comment | primary_key | nullable | default_value |
> encoding      | compression         | block_size |
> +------+--------+---------+-------------+----------+--------
> -------+---------------+---------------------+------------+
> | col1 | bigint |         | true        | false    |               |
> AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
> | col2 | string |         | false       | true     |               |
> AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
> | col3 | double |         | false       | true     |               |
> AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
> +------+--------+---------+-------------+----------+--------
> -------+---------------+---------------------+------------+
>
> > insert into test values (1, "test", 68.01);
>
> > select * from test;
> Query: select * from test
> +------+------+-------------------+
> | col1 | col2 | col3              |
> +------+------+-------------------+
> | 1    | test | 68.01000000000001 |
> +------+------+-------------------+
>
> Finally, checking that the digit at the end is an artifact:
>
> > select (col3-68.01)*10e20 from test;
> Query: select (col3-68.01)*10e20 from test
> +--------------------------+
> | (col3 - 68.01) * 1.0e+21 |
> +--------------------------+
> | 0                        |
> +--------------------------+
>
> Regards,
> Aleksei Maželis
>

Mime
View raw message