hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesus Camacho Rodriguez (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-15640) Hive/Druid integration: null handling for metrics
Date Fri, 05 Jan 2018 19:46:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313748#comment-16313748
] 

Jesus Camacho Rodriguez commented on HIVE-15640:
------------------------------------------------

[~nishantbangarwa], I think this is fixed now? Can we close pointing to issue that fixed the
issue? Thanks

> Hive/Druid integration: null handling for metrics
> -------------------------------------------------
>
>                 Key: HIVE-15640
>                 URL: https://issues.apache.org/jira/browse/HIVE-15640
>             Project: Hive
>          Issue Type: Bug
>          Components: Druid integration
>    Affects Versions: 2.2.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Critical
>
> Null values for metrics in Druid and Hive are not handled the same way (_0.0_ vs _NULL_).
> In Druid:
> {code:sql}
> SELECT i_brand_id, floor_day(`__time`), max(ss_quantity), sum(ss_wholesale_cost) as s
> FROM store_sales_sold_time_subset
> WHERE floor_day(`__time`) BETWEEN '1999-11-01 00:00:00' AND '1999-11-10 00:00:00'
> GROUP BY i_brand_id, floor_day(`__time`)
> ORDER BY s
> LIMIT 10;
> OK
> 6015006	1999-11-03 00:00:00	0.0	0.0
> 9011009	1999-11-05 00:00:00	0.0	0.0
> 8003009	1999-11-03 00:00:00	11.0	1.0299999713897705
> 10005014	1999-11-05 00:00:00	86.0	1.100000023841858
> 6008007	1999-11-09 00:00:00	81.0	1.3700000047683716
> 6003003	1999-11-08 00:00:00	45.0	1.600000023841858
> 8008009	1999-11-08 00:00:00	98.0	1.7100000381469727
> 8015003	1999-11-02 00:00:00	10.0	1.7400000095367432
> 8004008	1999-11-10 00:00:00	45.0	1.7599999904632568
> 8009009	1999-11-07 00:00:00	81.0	1.7699999809265137
> {code}
> In Hive:
> {code:sql}
> SELECT i_brand_id, floor_day(`__time`), max(ss_quantity), sum(ss_wholesale_cost) as s
> FROM store_sales_sold_time_subset_hive
> WHERE floor_day(`__time`) BETWEEN '1999-11-01 00:00:00' AND '1999-11-10 00:00:00'
> GROUP BY i_brand_id, floor_day(`__time`)
> ORDER BY s
> LIMIT 10;
> OK
> 6015006	1999-11-03 00:00:00	NULL	NULL
> 9011009	1999-11-05 00:00:00	NULL	NULL
> 8003009	1999-11-03 00:00:00	11	1.03
> 10005014	1999-11-05 00:00:00	86	1.1
> 6008007	1999-11-09 00:00:00	81	1.37
> 6003003	1999-11-08 00:00:00	45	1.6
> 8008009	1999-11-08 00:00:00	98	1.71
> 8015003	1999-11-02 00:00:00	10	1.74
> 8004008	1999-11-10 00:00:00	45	1.76
> 8009009	1999-11-07 00:00:00	81	1.77
> {code}
> However, for Druid dimensions, NULL values seem to be handled properly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message