drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Clough (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4447) Drill seems to ignore TO_DATE(timestamp) when used inside DISTINCT() and GROUP BY
Date Mon, 29 Feb 2016 22:34:18 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172751#comment-15172751
] 

Ryan Clough commented on DRILL-4447:
------------------------------------

[~jaltekruse] Thanks for the clarification on the to_char function, I had a feeling at least
part of this might have been user error, but was hard to determine while focusing on the distinct/group
by bug which was my main issue.

Glad to have caught a more serious bug though. I initially caught it while using it in a GROUP
BY where my results were way bigger than I expected, and managed to discover it affected DISTINCT()
while trying to investigate the GROUP BY issue. Happy to provide any additional info if needed!

> Drill seems to ignore TO_DATE(timestamp) when used inside DISTINCT() and GROUP BY
> ---------------------------------------------------------------------------------
>
>                 Key: DRILL-4447
>                 URL: https://issues.apache.org/jira/browse/DRILL-4447
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.5.0
>         Environment: Centos 6.2/Distributed/CDH5.4.9
>            Reporter: Ryan Clough
>            Assignee: Jason Altekruse
>            Priority: Critical
>         Attachments: timestamps.txt, timestamps_parquet.tar.gz
>
>
> The issue comes from a larger query, but I've managed to narrow it down to what is a
minimally reproducible issue.
> Given a list of timestamps (will attach files) associated with 3 days, We want to select
the distinct dates (total: 3 days) from this list. To do this, I decided to use the TO_DATE
function, which does exactly what I want it.
> Note, there are 47 distinct timestamps in the data set.
> {code:sql}
> jdbc:drill:> SELECT DISTINCT(TO_DATE(data_date)) AS data_date
> . . . . . . . > FROM timestamps;
> +-------------+
> |  data_date  |
> +-------------+
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-23  |
> | 2016-02-23  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-23  |
> +-------------+
> 47 rows selected (11.057 seconds)
> {code}
> As you can see, DRILL has ignored the TO_DATE function when checking for distinct records
(note that the 47 rows matches the 47 rows of distinct timestamps).
> My testing has also shown that this affect GROUP BY. I wouldn't be surprised if it manifested
its self elsewhere.
> I tried to get around the problem by converting the dates to a string using TO_CHAR:
surely drill will use the resulting strings to do the DISTINCT comparison?
> {code:sql}
> drill:> SELECT DISTINCT(TO_CHAR(TO_DATE(data_date))) FROM timestamps;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming
schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: [to_char(DATE-OPTIONAL)].
 Full expression: --UNKNOWN EXPRESSION--..
> Fragment 0:0
> [Error Id: bcad87f0-3353-4a3b-842e-c68a02b394c3 on lvimhdpa14.lv.vimeows.com:31010] (state=,code=0)
> {code}
> As far as I can tell from the docs, you SHOULD be able to convert a date to a string
with TO_CHAR(). I'm not sure what the underlying issue is here, but I thought it good to report
the issue.
> Please let me know if you need any further info, query plans, etc, but it should be reproducable
with the timestamps data I'll attach in a minute



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message