drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Clough (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4447) Drill seems to ignore TO_DATE(timestamp) when used inside DISTINCT() and GROUP BY
Date Fri, 26 Feb 2016 19:38:18 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15169636#comment-15169636
] 

Ryan Clough commented on DRILL-4447:
------------------------------------

A bit more info: it seems to fail when encapsulated in a subquery like so: 

{code:sql}
WITH subq AS
(SELECT TO_DATE(data_date) as data_date FROM timestamps)
SELECT DISTINCT(data_date)
FROM subq;
{code}

But it DOES appear to WORK if you save them to their own table, and then run the query:

{code:sql}
0: jdbc:drill:> CREATE TABLE converted_dates AS
. . . . . . . > SELECT TO_DATE(data_date) as data_date FROM timestamps;
+-----------+----------------------------+
| Fragment  | Number of records written  |
+-----------+----------------------------+
| 0_0       | 47                         |
+-----------+----------------------------+
1 row selected (25.259 seconds)
0: jdbc:drill:> select distinct(data_date) from converted_dates;
+-------------+
|  data_date  |
+-------------+
| 2016-02-23  |
| 2016-02-25  |
| 2016-02-24  |
+-------------+
3 rows selected (14.062 seconds)
{code}

> Drill seems to ignore TO_DATE(timestamp) when used inside DISTINCT() and GROUP BY
> ---------------------------------------------------------------------------------
>
>                 Key: DRILL-4447
>                 URL: https://issues.apache.org/jira/browse/DRILL-4447
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.5.0
>         Environment: Centos 6.2/Distributed/CDH5.4.9
>            Reporter: Ryan Clough
>         Attachments: timestamps.txt, timestamps_parquet.tar.gz
>
>
> The issue comes from a larger query, but I've managed to narrow it down to what is a
minimally reproducible issue.
> Given a list of timestamps (will attach files) associated with 3 days, We want to select
the distinct dates (total: 3 days) from this list. To do this, I decided to use the TO_DATE
function, which does exactly what I want it.
> Note, there are 47 distinct timestamps in the data set.
> {code:sql}
> jdbc:drill:> SELECT DISTINCT(TO_DATE(data_date)) AS data_date
> . . . . . . . > FROM timestamps;
> +-------------+
> |  data_date  |
> +-------------+
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-23  |
> | 2016-02-23  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-23  |
> +-------------+
> 47 rows selected (11.057 seconds)
> {code}
> As you can see, DRILL has ignored the TO_DATE function when checking for distinct records
(note that the 47 rows matches the 47 rows of distinct timestamps).
> My testing has also shown that this affect GROUP BY. I wouldn't be surprised if it manifested
its self elsewhere.
> I tried to get around the problem by converting the dates to a string using TO_CHAR:
surely drill will use the resulting strings to do the DISTINCT comparison?
> {code:sql}
> drill:> SELECT DISTINCT(TO_CHAR(TO_DATE(data_date))) FROM timestamps;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming
schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: [to_char(DATE-OPTIONAL)].
 Full expression: --UNKNOWN EXPRESSION--..
> Fragment 0:0
> [Error Id: bcad87f0-3353-4a3b-842e-c68a02b394c3 on lvimhdpa14.lv.vimeows.com:31010] (state=,code=0)
> {code}
> As far as I can tell from the docs, you SHOULD be able to convert a date to a string
with TO_CHAR(). I'm not sure what the underlying issue is here, but I thought it good to report
the issue.
> Please let me know if you need any further info, query plans, etc, but it should be reproducable
with the timestamps data I'll attach in a minute



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message