Return-Path: X-Original-To: apmail-drill-dev-archive@www.apache.org Delivered-To: apmail-drill-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 12DC0183D4 for ; Mon, 12 Oct 2015 21:57:06 +0000 (UTC) Received: (qmail 3408 invoked by uid 500); 12 Oct 2015 21:57:05 -0000 Delivered-To: apmail-drill-dev-archive@drill.apache.org Received: (qmail 3352 invoked by uid 500); 12 Oct 2015 21:57:05 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 3339 invoked by uid 99); 12 Oct 2015 21:57:05 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Oct 2015 21:57:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 1407BC2C41 for ; Mon, 12 Oct 2015 21:57:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.98 X-Spam-Level: *** X-Spam-Status: No, score=3.98 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id krJ3CCcVhzOP for ; Mon, 12 Oct 2015 21:57:04 +0000 (UTC) Received: from mail-wi0-f181.google.com (mail-wi0-f181.google.com [209.85.212.181]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 50BB52031C for ; Mon, 12 Oct 2015 21:57:03 +0000 (UTC) Received: by wicgb1 with SMTP id gb1so164992629wic.1 for ; Mon, 12 Oct 2015 14:57:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=jDhVfzJmR4ZC1jzrDkcBxYNJHhCZo7G86l6GLu6CvUM=; b=EiP0XQvihHwz1KbWgNqzEJc5l5w/KOnMLouFgKbMRZBu86DjV5Q6DPdSOHdKsh2266 NodtSDfosSZrrxEzprqf9ukwNx0nW3DgdzCwgCA3rfbScqOegp/LifRR7SszKHEIRoTg libTxam1BHBTiaKrzks4XxS1NF/lzDGTb2GIyEHKmeIHLtf9HUZ4q2XuCIo764M9uLDn h2dxZTOVf/Q17kU5U2Ut0i6Bk9rtevy/lpHeu/vcZ/UD7mUh91WDemuKc80239yfVquT qU7Xp+601IY6FbiaZLGkNPrArFo1nvDgBDCtl3PUeWe8CXsIWXz3soVt7sUEp3+gfOGu yUzQ== X-Gm-Message-State: ALoCoQnEapG2u109lEGGWwPm8Qn8oBavSCqC4MS0hyrmCZmLEUl5VCBn3jVvz6W5TfqAF1OuSO2j MIME-Version: 1.0 X-Received: by 10.180.87.74 with SMTP id v10mr15616546wiz.66.1444687021858; Mon, 12 Oct 2015 14:57:01 -0700 (PDT) Received: by 10.27.101.10 with HTTP; Mon, 12 Oct 2015 14:57:01 -0700 (PDT) In-Reply-To: References: Date: Mon, 12 Oct 2015 14:57:01 -0700 Message-ID: Subject: Re: flatten() function, scalar functions, nested ? From: Jacques Nadeau To: dev@drill.apache.org Content-Type: multipart/alternative; boundary=f46d0444032ebffc040521ef68dc --f46d0444032ebffc040521ef68dc Content-Type: text/plain; charset=UTF-8 It's funny, I've been looking at this stuff recently in the thinking about trying to fit Flatten into Calcite's UDTF construct. My inclination is that we basically have sugared the syntax for a lateral join with a udtf. I think this is a useful syntax when restricted to the select clause and think we should make it more general to all udtf's. What do others think? I think we should also add support for the more traditional syntax structure. Have you looked at Calcite's planning of lateral join and UDTFs as we may be able to push this sugared syntax into a clearer pattern. With regards to not allowing expression trees with flatten, I'm not sure that is a required constraint. -- Jacques Nadeau CTO and Co-Founder, Dremio On Mon, Oct 12, 2015 at 2:44 PM, Jinfeng Ni wrote: > Drill currently supports flatten(), which flattens elements in a > repeated field into a set of individual records [1]. > > My understanding is that flatten is not part of SQL standard (SQL > standard uses UNNEST). It's a concept borrowed from Google's > BigQuery[2]. However, the way Drill uses flatten seems to be > "different" from the way flatten is used in BiqQuery, or UNNEST in > SQL. > > 1) In Drill, flatten is more like a regular function; it could appear > in SELECT clause, ORDER BY, GROUP BY, but NOT in FROM clause. > In contrast, in BigQuery, flatten seems to only appear in the FROM > clause, in a way more like table function. > > 2) Because Drill treats flatten like a regular function, it could be > nested with a regular scalar function. I could not find such usage in > BigQuery (Please point me an example if someone finds an example). > > As such, Drill currently allows the following: > > Assuming (T1 has one row only and T1.intList : [ 1, 2, 3] ) > > Q1: > select flatten(T1.intList) + 1 > from > T1; > > Would return 3 rows: > 2, > 3, > 4 > > To me, it is kind of unusual usage to 1) allow flatten to appear > anywhere where a regular scalar function would appear, 2) allow > flatten to be nested with regular scalar function. > > Such "flexibility" probably has not been well tested. For instance, > if I put flatten() in WHERE clause, Drill would throw function > resolution error. > > Also, I'm not sure how Drill would do if we put flatten() nested with > regular scalar function inside window aggregate function, with clause > (CTE), etc? > > My question is : should Drill restrict flatten such that > > 1) it will only appear in SELECT clause for now (I understand it would > require significant re-work if we want to use BiqQuery way and use > flatten in FROM clause). > 2) it would not allow flatten to be nested with regular scalar function. > > With such restriction, the above Q1 could be rewritten into: > > select f + 1 > from ( select flatten(T1.intList) as f > from T1) > > I understand there is some discussion to improve flatten() in some > JIRAs. What does other think about adding this restriction to > flatten? > > > [1]. https://drill.apache.org/docs/flatten/ > [2]. https://cloud.google.com/bigquery/docs/data#flatten > --f46d0444032ebffc040521ef68dc--