Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id DB74F200CF3 for ; Wed, 30 Aug 2017 03:00:25 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id DA3FA167F19; Wed, 30 Aug 2017 01:00:25 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 2BA26167F17 for ; Wed, 30 Aug 2017 03:00:25 +0200 (CEST) Received: (qmail 66993 invoked by uid 500); 30 Aug 2017 01:00:24 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 66982 invoked by uid 99); 30 Aug 2017 01:00:23 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Aug 2017 01:00:23 +0000 Received: from mail-qt0-f178.google.com (mail-qt0-f178.google.com [209.85.216.178]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 8E1761A0940 for ; Wed, 30 Aug 2017 01:00:23 +0000 (UTC) Received: by mail-qt0-f178.google.com with SMTP id x36so21883426qtx.2 for ; Tue, 29 Aug 2017 18:00:23 -0700 (PDT) X-Gm-Message-State: AHYfb5jb3axEQSo37cDiRSv0wA6YB3cW0SoU/v2pxPqMidL/DlGCmgOU W/PwUBZXIOFVQrz5gFSc+FuBI9B8VA== X-Received: by 10.200.55.148 with SMTP id d20mr328423qtc.59.1504054822105; Tue, 29 Aug 2017 18:00:22 -0700 (PDT) MIME-Version: 1.0 Received: by 10.140.83.167 with HTTP; Tue, 29 Aug 2017 18:00:21 -0700 (PDT) In-Reply-To: References: From: Aman Sinha Date: Tue, 29 Aug 2017 18:00:21 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Drill 2.0 (design) hackathon To: dev@drill.apache.org Content-Type: multipart/alternative; boundary="001a1146449c64f73c0557ee0d4e" archived-at: Wed, 30 Aug 2017 01:00:26 -0000 --001a1146449c64f73c0557ee0d4e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Paul, certainly makes sense to have the API compatibility discussions during this hackathon. The 2.0 release may be a good checkpoint to introduce breaking changes necessitating changes to the ODBC/JDBC drivers and other external applications. As part of this exercise (not during the hackathon but as a follow-up action), we also should clearly identify the "public" interfaces. I will add this to the agenda. thanks, -Aman On Tue, Aug 29, 2017 at 2:08 PM, Paul Rogers wrote: > Thanks Aman for organizing the Hackathon! > > The list included many good ideas for Drill 2.0. Some of those require > changes to Drill=E2=80=99s =E2=80=9Cpublic=E2=80=9D interfaces (file form= at, client protocol, SQL > behavior, etc.) > > At present, Drill has no good mechanism to handle backward/forward > compatibility at the API level. Protobuf versioning certainly helps, but > can=E2=80=99t completely solve semantic changes (where a field changes me= aning, or > a non-Protobuf data chunk changes format.) As just one concrete example, > changing to Arrow will break pre-Arrow ODBC/JDBC drivers because class > names and data formats will change. > > Perhaps we can prioritize, for the proposed 2.0 release, a one-time set o= f > breaking changes that introduce a versioning mechanism into our public > APIs. Once these are in place, we can evolve the APIs in the future by > following the newly-created versioning protocol. > > Without such a mechanism, we cannot support old & new clients in the same > cluster. Nor can we support rolling upgrades. Of course, another solution > is to get it right the second time, then freeze all APIs and agree to nev= er > again change them. Not sure we have sufficient access to a crystal ball t= o > predict everything we=E2=80=99d ever need in our APIs, however... > > Thanks, > > - Paul > > > On Aug 24, 2017, at 8:39 AM, Aman Sinha wrote: > > > > Drill Developers, > > > > In order to kick-start the Drill 2.0 release discussions, I would like > to > > propose a Drill 2.0 (design) hackathon (a.k.a Drill Developer Day =E2= =84=A2 J ). > > > > As I mentioned in the hangout on Tuesday, MapR has offered to host it = on > > Sept 18th at their offices at 350 Holger Way, San Jose. Hope that wor= ks > > for most of you! > > > > The goal is to get the community together for a day-long technical > > discussion on key topics in preparation for a Drill 2.0 release as well > as > > potential improvements in upcoming 1.xx releases. Depending on the > > interest areas, we could form groups and have a volunteer lead each > group. > > > > Based on prior discussions on the dev list, hangouts and existing JIRAs= , > > there is already a substantial set of topics and I have summarized a fe= w > of > > them below. What other topics do folks want to talk about? Feel fre= e > to > > respond to this thread and I will create a google doc to consolidate. > > Understandably, the list would be long but we will use the hackathon to > get > > a sense of a reasonable feature set for 1.xx and 2.0 releases. > > > > > > 1. Metadata management. > > > > 1a: Defining an abstraction layer for various types of metadata: views= , > > schema, statistics, security > > > > 1b: Underlying storage for metadata: what are the options and their > > trade-offs? > > > > - Hive metastore > > > > - Parquet metadata cache (parquet specific) > > > > - An embedded DBMS > > > > - A distributed key-value store > > > > - Others.. > > > > > > > > 2. Drill integration with Apache Arrow > > > > 2a: Evaluate the choices and tradeoffs > > > > > > > > 3. Resource management > > > > 3a: Memory limits per query > > > > 3b: Spilling > > > > 3c: Resource management with Drill on Yarn/Mesos/Kubernetes > > > > 3d: Local vs. global resource management > > > > 3e: Aligning with admission control/queueing > > > > > > > > 4. TPC-DS coverage and related planner/operator enhancements > > > > 4a: Additional set operations: INTERSECT, EXCEPT > > > > 4b: GROUPING SETS, ROLLUP, CUBE support > > > > 4c: Handling inequality joins and cartesian joins of non-scalar inputs > > (via Nested Loop Join) > > > > 4d: Remaining gaps in correlated subquery > > > > 4e: Statistics: Number of Distinct Values, Histograms > > > > > > > > 5. Schema handling > > > > 5a: Creation, management of schema > > > > 5b: Handling schema changes in certain common cases > > > > 5c: Schema-awareness > > > > 5d: Others TBD > > > > > > > > 6. Concurrency > > > > 6a: What are the bottlenecks to achieving higher concurrency > > > > 6b: Ideas to address these..e.g async execution ? > > > > > > > > 7. Storage plugins, REST APIs related enhancements > > > > > > > > > > > > 8. Performance improvements > > > > 8a: Filter pushdown > > > > 8b: Vectorized Parquet reader > > > > 8c: Code-gen improvements > > > > 8d: Others TBD > > --001a1146449c64f73c0557ee0d4e--