drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Altekruse (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4447) Drill seems to ignore TO_DATE(timestamp) when used inside DISTINCT() and GROUP BY
Date Mon, 29 Feb 2016 21:59:18 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172690#comment-15172690
] 

Jason Altekruse commented on DRILL-4447:
----------------------------------------

Couple of things here, the reason that the to_char function was not found is that it actually
requires a second parameter to specify the desired format. More info can be found on this
doc page: https://drill.apache.org/docs/data-type-conversion/#to_char

Using to_char correctly I was able to get the 3 distinct values back int he query, but your
original query looks like a bug running a distinct aggregate over date values. I am going
to raise the priority as this is a wrong result issue and try to look into it soon.

I do not understand why writing the data out to a file changes the result of the aggregation,
that seems pretty puzzling.



> Drill seems to ignore TO_DATE(timestamp) when used inside DISTINCT() and GROUP BY
> ---------------------------------------------------------------------------------
>
>                 Key: DRILL-4447
>                 URL: https://issues.apache.org/jira/browse/DRILL-4447
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.5.0
>         Environment: Centos 6.2/Distributed/CDH5.4.9
>            Reporter: Ryan Clough
>         Attachments: timestamps.txt, timestamps_parquet.tar.gz
>
>
> The issue comes from a larger query, but I've managed to narrow it down to what is a
minimally reproducible issue.
> Given a list of timestamps (will attach files) associated with 3 days, We want to select
the distinct dates (total: 3 days) from this list. To do this, I decided to use the TO_DATE
function, which does exactly what I want it.
> Note, there are 47 distinct timestamps in the data set.
> {code:sql}
> jdbc:drill:> SELECT DISTINCT(TO_DATE(data_date)) AS data_date
> . . . . . . . > FROM timestamps;
> +-------------+
> |  data_date  |
> +-------------+
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-23  |
> | 2016-02-23  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-23  |
> +-------------+
> 47 rows selected (11.057 seconds)
> {code}
> As you can see, DRILL has ignored the TO_DATE function when checking for distinct records
(note that the 47 rows matches the 47 rows of distinct timestamps).
> My testing has also shown that this affect GROUP BY. I wouldn't be surprised if it manifested
its self elsewhere.
> I tried to get around the problem by converting the dates to a string using TO_CHAR:
surely drill will use the resulting strings to do the DISTINCT comparison?
> {code:sql}
> drill:> SELECT DISTINCT(TO_CHAR(TO_DATE(data_date))) FROM timestamps;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming
schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: [to_char(DATE-OPTIONAL)].
 Full expression: --UNKNOWN EXPRESSION--..
> Fragment 0:0
> [Error Id: bcad87f0-3353-4a3b-842e-c68a02b394c3 on lvimhdpa14.lv.vimeows.com:31010] (state=,code=0)
> {code}
> As far as I can tell from the docs, you SHOULD be able to convert a date to a string
with TO_CHAR(). I'm not sure what the underlying issue is here, but I thought it good to report
the issue.
> Please let me know if you need any further info, query plans, etc, but it should be reproducable
with the timestamps data I'll attach in a minute



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message