Mailing-List: contact dev-help@drill.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@drill.apache.org
MIME-Version: 1.0
In-Reply-To: <D0507705-EBD4-421E-965C-429B9424BC26@mapr.com>
References: <CAMvaAPvLG3_E1_xPLhDxAdbsQxtFS4MfXwgsUr_WzSYy66T84Q@mail.gmail.com>
 <D0507705-EBD4-421E-965C-429B9424BC26@mapr.com>
From: Aman Sinha <amansinha@apache.org>
Date: Tue, 29 Aug 2017 18:00:21 -0700
Message-ID: <CAMvaAPvwjSsqqJ2gN0wP=9aJtV-mMK00ovQGcVVs2Lcw9AdbQA@mail.gmail.com>
Subject: Re: Drill 2.0 (design) hackathon
To: dev@drill.apache.org
Content-Type: multipart/alternative; boundary="001a1146449c64f73c0557ee0d4e"
archived-at: Wed, 30 Aug 2017 01:00:26 -0000

--001a1146449c64f73c0557ee0d4e
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hi Paul,
certainly makes sense to have the API compatibility discussions during this
hackathon.  The 2.0 release may be a good checkpoint to introduce breaking
changes necessitating changes to the ODBC/JDBC drivers and other external
applications. As part of this exercise (not during the hackathon but as a
follow-up action), we also should clearly identify the "public" interfaces.


I will add this to the agenda.

thanks,
-Aman

On Tue, Aug 29, 2017 at 2:08 PM, Paul Rogers <progers@mapr.com> wrote:

> Thanks Aman for organizing the Hackathon!
>
> The list included many good ideas for Drill 2.0. Some of those require
> changes to Drill=E2=80=99s =E2=80=9Cpublic=E2=80=9D interfaces (file form=
at, client protocol, SQL
> behavior, etc.)
>
> At present, Drill has no good mechanism to handle backward/forward
> compatibility at the API level. Protobuf versioning certainly helps, but
> can=E2=80=99t completely solve semantic changes (where a field changes me=
aning, or
> a non-Protobuf data chunk changes format.) As just one concrete example,
> changing to Arrow will break pre-Arrow ODBC/JDBC drivers because class
> names and data formats will change.
>
> Perhaps we can prioritize, for the proposed 2.0 release, a one-time set o=
f
> breaking changes that introduce a versioning mechanism into our public
> APIs. Once these are in place, we can evolve the APIs in the future by
> following the newly-created versioning protocol.
>
> Without such a mechanism, we cannot support old & new clients in the same
> cluster. Nor can we support rolling upgrades. Of course, another solution
> is to get it right the second time, then freeze all APIs and agree to nev=
er
> again change them. Not sure we have sufficient access to a crystal ball t=
o
> predict everything we=E2=80=99d ever need in our APIs, however...
>
> Thanks,
>
> - Paul
>
> > On Aug 24, 2017, at 8:39 AM, Aman Sinha <amansinha@apache.org> wrote:
> >
> > Drill Developers,
> >
> > In order to kick-start the Drill 2.0  release discussions, I would like
> to
> > propose a Drill 2.0  (design) hackathon (a.k.a Drill Developer Day =E2=
=84=A2 J ).
> >
> > As I mentioned in the hangout on Tuesday,  MapR has offered to host it =
on
> > Sept 18th at their offices at 350 Holger Way, San Jose.   Hope that wor=
ks
> > for most of you!
> >
> > The goal is to get the community together for a day-long technical
> > discussion on key topics in preparation for a Drill 2.0 release as well
> as
> > potential improvements in upcoming 1.xx releases.  Depending on the
> > interest areas, we could form groups and have a volunteer lead each
> group.
> >
> > Based on prior discussions on the dev list, hangouts and existing JIRAs=
,
> > there is already a substantial set of topics and I have summarized a fe=
w
> of
> > them below.   What other topics do folks want to talk about?   Feel fre=
e
> to
> > respond to this thread and I will create a google doc to consolidate.
> > Understandably, the list would be long but we will use the hackathon to
> get
> > a sense of a reasonable feature set for 1.xx and 2.0 releases.
> >
> >
> > 1. Metadata management.
> >
> >  1a: Defining an abstraction layer for various types of metadata: views=
,
> > schema, statistics, security
> >
> >  1b: Underlying storage for metadata: what are the options and their
> > trade-offs?
> >
> >      - Hive metastore
> >
> >      - Parquet metadata cache (parquet specific)
> >
> >      - An embedded DBMS
> >
> >      - A distributed key-value store
> >
> >      - Others..
> >
> >
> >
> > 2. Drill integration with Apache Arrow
> >
> >  2a: Evaluate the choices and tradeoffs
> >
> >
> >
> > 3. Resource management
> >
> >  3a: Memory limits per query
> >
> >  3b: Spilling
> >
> >  3c: Resource management with Drill on Yarn/Mesos/Kubernetes
> >
> >  3d: Local vs. global resource management
> >
> >  3e: Aligning with admission control/queueing
> >
> >
> >
> > 4. TPC-DS coverage and related planner/operator enhancements
> >
> >  4a: Additional set operations: INTERSECT, EXCEPT
> >
> >  4b: GROUPING SETS, ROLLUP, CUBE support
> >
> >  4c: Handling inequality joins and cartesian joins of non-scalar inputs
> > (via Nested Loop Join)
> >
> >  4d: Remaining gaps in correlated subquery
> >
> >  4e: Statistics: Number of Distinct Values, Histograms
> >
> >
> >
> > 5. Schema handling
> >
> >  5a: Creation, management of schema
> >
> >  5b: Handling schema changes in certain common cases
> >
> >  5c: Schema-awareness
> >
> >  5d: Others TBD
> >
> >
> >
> > 6. Concurrency
> >
> >  6a: What are the bottlenecks to achieving higher concurrency
> >
> >  6b: Ideas to address these..e.g async execution ?
> >
> >
> >
> > 7. Storage plugins,  REST APIs related enhancements
> >
> >    <Topics TBD>
> >
> >
> >
> > 8. Performance improvements
> >
> >  8a: Filter pushdown
> >
> >  8b: Vectorized Parquet reader
> >
> >  8c: Code-gen improvements
> >
> >  8d: Others TBD
>
>

--001a1146449c64f73c0557ee0d4e--