Return-Path: X-Original-To: apmail-drill-dev-archive@www.apache.org Delivered-To: apmail-drill-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7882018948 for ; Tue, 26 May 2015 18:53:24 +0000 (UTC) Received: (qmail 78827 invoked by uid 500); 26 May 2015 18:53:18 -0000 Delivered-To: apmail-drill-dev-archive@drill.apache.org Received: (qmail 78774 invoked by uid 500); 26 May 2015 18:53:18 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 78760 invoked by uid 99); 26 May 2015 18:53:18 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 May 2015 18:53:18 +0000 Date: Tue, 26 May 2015 18:53:17 +0000 (UTC) From: "Victoria Markman (JIRA)" To: dev@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (DRILL-3182) Window function with DISTINCT qualifier returns seemingly incorrect result MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Victoria Markman created DRILL-3182: --------------------------------------- Summary: Window function with DISTINCT qualifier returns seemingly incorrect result Key: DRILL-3182 URL: https://issues.apache.org/jira/browse/DRILL-3182 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Affects Versions: 1.0.0 Reporter: Victoria Markman Assignee: Daniel Barclay (Drill) Both count(distinct ) and count(all ) return the same result. It does not look correct to me and I'm not sure what the correct behavior is going to be. (1) Latest postgres does not support distinct with Window functions: postgres=# select a2, count(distinct b2) over(partition by a2) from t2; ERROR: DISTINCT is not implemented for window functions LINE 1: select a2, count(distinct b2) over(partition by a2) from t2; ^ postgres=# select a2, avg(distinct a2) over(partition by a2) from t2; ERROR: DISTINCT is not implemented for window functions LINE 1: select a2, avg(distinct a2) over(partition by a2) from t2; (2) Calcite does not support this either: https://github.com/apache/incubator-calcite/blob/master/doc/reference.md Do we support it ? If not, I think we should throw an error ... {code} 0: jdbc:drill:schema=dfs> select * from t2; +-----+--------+-------------+ | a2 | b2 | c2 | +-----+--------+-------------+ | 0 | zzz | 2014-12-31 | | 1 | aaaaa | 2015-01-01 | | 2 | bbbbb | 2015-01-02 | | 2 | bbbbb | 2015-01-02 | | 2 | bbbbb | 2015-01-02 | | 3 | ccccc | 2015-01-03 | | 4 | ddddd | 2015-01-04 | | 5 | eeeee | 2015-01-05 | | 6 | fffff | 2015-01-06 | | 7 | ggggg | 2015-01-07 | | 7 | ggggg | 2015-01-07 | | 8 | hhhhh | 2015-01-08 | | 9 | iiiii | 2015-01-09 | +-----+--------+-------------+ 13 rows selected (0.134 seconds) 0: jdbc:drill:schema=dfs> select a2, count(distinct b2) over(partition by a2) from t2; +-----+---------+ | a2 | EXPR$1 | +-----+---------+ | 0 | 1 | | 1 | 1 | | 2 | 3 | | 2 | 3 | | 2 | 3 | | 3 | 1 | | 4 | 1 | | 5 | 1 | | 6 | 1 | | 7 | 2 | | 7 | 2 | | 8 | 1 | | 9 | 1 | +-----+---------+ 13 rows selected (0.224 seconds) 0: jdbc:drill:schema=dfs> select a2, count(b2) over(partition by a2) from t2; +-----+---------+ | a2 | EXPR$1 | +-----+---------+ | 0 | 1 | | 1 | 1 | | 2 | 3 | | 2 | 3 | | 2 | 3 | | 3 | 1 | | 4 | 1 | | 5 | 1 | | 6 | 1 | | 7 | 2 | | 7 | 2 | | 8 | 1 | | 9 | 1 | +-----+---------+ 13 rows selected (0.219 seconds) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)