drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <prog...@mapr.com>
Subject Re: Drill 2.0 (design) hackathon
Date Tue, 29 Aug 2017 21:08:23 GMT
Thanks Aman for organizing the Hackathon!

The list included many good ideas for Drill 2.0. Some of those require changes to Drill’s
“public” interfaces (file format, client protocol, SQL behavior, etc.)

At present, Drill has no good mechanism to handle backward/forward compatibility at the API
level. Protobuf versioning certainly helps, but can’t completely solve semantic changes
(where a field changes meaning, or a non-Protobuf data chunk changes format.) As just one
concrete example, changing to Arrow will break pre-Arrow ODBC/JDBC drivers because class names
and data formats will change. 

Perhaps we can prioritize, for the proposed 2.0 release, a one-time set of breaking changes
that introduce a versioning mechanism into our public APIs. Once these are in place, we can
evolve the APIs in the future by following the newly-created versioning protocol.

Without such a mechanism, we cannot support old & new clients in the same cluster. Nor
can we support rolling upgrades. Of course, another solution is to get it right the second
time, then freeze all APIs and agree to never again change them. Not sure we have sufficient
access to a crystal ball to predict everything we’d ever need in our APIs, however...

Thanks,

- Paul

> On Aug 24, 2017, at 8:39 AM, Aman Sinha <amansinha@apache.org> wrote:
> 
> Drill Developers,
> 
> In order to kick-start the Drill 2.0  release discussions, I would like to
> propose a Drill 2.0  (design) hackathon (a.k.a Drill Developer Day ™ J ).
> 
> As I mentioned in the hangout on Tuesday,  MapR has offered to host it on
> Sept 18th at their offices at 350 Holger Way, San Jose.   Hope that works
> for most of you!
> 
> The goal is to get the community together for a day-long technical
> discussion on key topics in preparation for a Drill 2.0 release as well as
> potential improvements in upcoming 1.xx releases.  Depending on the
> interest areas, we could form groups and have a volunteer lead each group.
> 
> Based on prior discussions on the dev list, hangouts and existing JIRAs,
> there is already a substantial set of topics and I have summarized a few of
> them below.   What other topics do folks want to talk about?   Feel free to
> respond to this thread and I will create a google doc to consolidate.
> Understandably, the list would be long but we will use the hackathon to get
> a sense of a reasonable feature set for 1.xx and 2.0 releases.
> 
> 
> 1. Metadata management.
> 
>  1a: Defining an abstraction layer for various types of metadata: views,
> schema, statistics, security
> 
>  1b: Underlying storage for metadata: what are the options and their
> trade-offs?
> 
>      - Hive metastore
> 
>      - Parquet metadata cache (parquet specific)
> 
>      - An embedded DBMS
> 
>      - A distributed key-value store
> 
>      - Others..
> 
> 
> 
> 2. Drill integration with Apache Arrow
> 
>  2a: Evaluate the choices and tradeoffs
> 
> 
> 
> 3. Resource management
> 
>  3a: Memory limits per query
> 
>  3b: Spilling
> 
>  3c: Resource management with Drill on Yarn/Mesos/Kubernetes
> 
>  3d: Local vs. global resource management
> 
>  3e: Aligning with admission control/queueing
> 
> 
> 
> 4. TPC-DS coverage and related planner/operator enhancements
> 
>  4a: Additional set operations: INTERSECT, EXCEPT
> 
>  4b: GROUPING SETS, ROLLUP, CUBE support
> 
>  4c: Handling inequality joins and cartesian joins of non-scalar inputs
> (via Nested Loop Join)
> 
>  4d: Remaining gaps in correlated subquery
> 
>  4e: Statistics: Number of Distinct Values, Histograms
> 
> 
> 
> 5. Schema handling
> 
>  5a: Creation, management of schema
> 
>  5b: Handling schema changes in certain common cases
> 
>  5c: Schema-awareness
> 
>  5d: Others TBD
> 
> 
> 
> 6. Concurrency
> 
>  6a: What are the bottlenecks to achieving higher concurrency
> 
>  6b: Ideas to address these..e.g async execution ?
> 
> 
> 
> 7. Storage plugins,  REST APIs related enhancements
> 
>    <Topics TBD>
> 
> 
> 
> 8. Performance improvements
> 
>  8a: Filter pushdown
> 
>  8b: Vectorized Parquet reader
> 
>  8c: Code-gen improvements
> 
>  8d: Others TBD

Mime
View raw message