drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Deneche A. Hakim (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (DRILL-4191) Last value function returns incorrect results.
Date Mon, 14 Dec 2015 22:39:46 GMT

     [ https://issues.apache.org/jira/browse/DRILL-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Deneche A. Hakim closed DRILL-4191.
-----------------------------------
    Resolution: Not A Problem
      Assignee: Deneche A. Hakim

According to the SQL standard, last_value for a row is the last row of it's "window frame"
and not partition. If you remove the ORDER BY clause from the query, all rows will share the
same frame which is the whole partition and all will get the same last_value:
{noformat}
select c1, last_value(c2) over(partition by c1) from dfs.tmp.md_627;
+-----+-------------+
| c1  |   EXPR$1    |
+-----+-------------+
| 1   | 2015-12-12  |
| 1   | 2015-12-12  |
| 1   | 2015-12-12  |
+-----+-------------+
3 rows selected (0.469 seconds)
{noformat}

> Last value function returns incorrect results.
> ----------------------------------------------
>
>                 Key: DRILL-4191
>                 URL: https://issues.apache.org/jira/browse/DRILL-4191
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.4.0
>         Environment: 4 node cluster on CentOS
>            Reporter: Khurram Faraaz
>            Assignee: Deneche A. Hakim
>            Priority: Critical
>
> Last value function returns incorrect results.
> {noformat}
> [root@centos-01 ~]# cat MD627.csv
> 1,2015-01-01
> 1,2015-01-02
> 1,2015-12-12
> git commit id : b9068117
> create table md_627 as select cast(columns[0] as int) c1, cast(columns[1] as date) c2
from `MD627.csv`;
> [root@centos-01 parquet-tools]# ./parquet-schema ../md627/0_0_0.parquet
> message root {
>   optional int32 c1;
>   optional int32 c2 (DATE);
> }
> 0: jdbc:drill:schema=dfs.tmp> select * from md_627;
> +-----+-------------+
> | c1  |     c2      |
> +-----+-------------+
> | 1   | 2015-01-01  |
> | 1   | 2015-01-02  |
> | 1   | 2015-12-12  |
> +-----+-------------+
> 3 rows selected (0.265 seconds)
> 0: jdbc:drill:schema=dfs.tmp> select c1, last_value(c2) over(partition by c1 order
by c2) lstval from md_627;
> +-----+-------------+
> | c1  |   lstval    |
> +-----+-------------+
> | 1   | 2015-01-01  |
> | 1   | 2015-01-02  |
> | 1   | 2015-12-12  |
> +-----+-------------+
> 3 rows selected (0.405 seconds)
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select c1, last_value(c2) over(partition
by c1 order by c2) lstval from md_627;
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(c1=[$0], lstval=[$1])
> 00-02        Project(c1=[$0], lstval=[$1])
> 00-03          Project(c1=[$0], $1=[$2])
> 00-04            Window(window#0=[window(partition {0} order by [1] range between UNBOUNDED
PRECEDING and CURRENT ROW aggs [LAST_VALUE($1)])])
> 00-05              SelectionVectorRemover
> 00-06                Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
> 00-07                  Project(c1=[$1], c2=[$0])
> 00-08                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:///tmp/md_627]], selectionRoot=maprfs:/tmp/md_627, numFiles=1, usedMetadataFile=false,
columns=[`c1`, `c2`]]])
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message