drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From AnilKumar B <akumarb2...@gmail.com>
Subject Re: Drill 2.0 (design) hackathon
Date Wed, 20 Sep 2017 15:20:42 GMT
Thanks All, it is really helpful.

On Wed, Sep 20, 2017 at 8:13 AM Charles Givre <cgivre@gmail.com> wrote:

> Thank you Aman for organizing and to MapR for hosting!
>
> On Wed, Sep 20, 2017 at 11:12 AM, Aman Sinha <amansinha@apache.org> wrote:
>
> > Thanks to all the folks who attended the hackathon - both local and
> remote.
> >   For the remote attendees, you missed out on a good dinner :)
> >
> > We had a day of excellent discussion on several topics:  Resource
> > management, operator level performance improvements, TPC-DS coverage,
> > metadata management, concurrency, usability and error handling, storage
> > plugins + rest APIs.   It will take a couple of days to compile all the
> > notes and we will post them.
> >
> > Since the focus was more in-depth discussion rather than breadth, and 1
> day
> > is clearly not adequate, some topics were left out.  We can continue
> those
> > discussions on the dev list / hangout  or if it can wait, possibly do it
> in
> > a future hackathon.
> >
> > -Aman
> >
> > On Fri, Sep 15, 2017 at 2:54 PM, Charles Givre <cgivre@gmail.com> wrote:
> >
> > > Hi Pritesh,
> > > What time do you think you’d want me to present?  Also, should I make
> > some
> > > slides?
> > > Best,
> > > — C
> > >
> > > > On Sep 15, 2017, at 13:23, Pritesh Maker <pmaker@mapr.com> wrote:
> > > >
> > > > Hi All
> > > >
> > > > We are looking forward to hosting the hackathon on Monday. Just a few
> > > updates on the logistics and agenda
> > > >
> > > > • We are expecting over 25 people attending the event – you can see
> the
> > > attendee list at the Eventbrite site -  https://www.eventbrite.com/e/
> > > drill-developer-day-sept-2017-registration-7478463285
> > > >
> > > > • Breakfast will be served starting at 8:30AM – we would like to
> begin
> > > promptly at 9AM
> > > >
> > > > • The agenda has been updated to reflect the speakers (see the update
> > in
> > > the sheet - https://docs.google.com/spreadsheets/d/
> > > 1PEpgmBNAaPcu9UhWmZ8yPYtXbUGqOAYwH87alWkpCic/edit#gid=0 )
> > > > o Key Note & Introduction – Ted Dunning, Parth Chandra and Aman
Sinha
> > > > o Community Contributions – Anil Kumar, John Omernik, Charles Givre
> and
> > > Ted Dunning
> > > > o Two tracks for technical design discussions – some topics have
> > initial
> > > thoughts for the topics and some will have open brainstorming
> discussions
> > > > o Once the discussions are concluded, we will have summaries
> presented
> > > and notes shared with the community
> > > >
> > > > • We will have a WebEx for the first two sessions. For the two
> tracks,
> > > we will either continue the WebEx or have Hangout links (will publish
> > them
> > > to the google sheet)
> > > > "JOIN WEBEX MEETING
> > > >
> https://mapr.webex.com/mapr/j.php?MTID=m9d39036e3953cce59ea81250c70c6
> > c76
> > > > Meeting number (access code): 806 111 950
> > > > Meeting password: ApacheDrill"
> > > >
> > > > • For the attendees in person, we have made bookings for a dinner in
> > the
> > > evening - https://www.yelp.com/biz/chili-garden-restaurant-milpitas
> > > >
> > > > Looking forward to a fantastic day for the Apache Drill! community!
> > > >
> > > > Thanks,
> > > > Pritesh
> > > >
> > > >
> > > >
> > > > On 9/5/17, 10:47 PM, "Aman Sinha" <amansinha@apache.org> wrote:
> > > >
> > > >    Here is the Eventbrite event for registration:
> > > >
> > > >    https://www.eventbrite.com/e/drill-developer-day-sept-2017-
> > > registration-7478463285
> > > >
> > > >    Please register so we can plan for food and drinks appropriately.
> > > >
> > > >    The link also contains a google doc link for the preliminary
> agenda
> > > and a
> > > >    'Topics' tab with volunteer sign-up column.  Please add your name
> to
> > > the
> > > >    area(s) of interest.
> > > >
> > > >    Thanks and look forward to seeing you all !
> > > >
> > > >    -Aman
> > > >
> > > >    On Wed, Aug 30, 2017 at 9:44 AM, Paul Rogers <progers@mapr.com>
> > > wrote:
> > > >
> > > >> A partial list of Drill’s public APIs:
> > > >>
> > > >> IMHO, highest priority for Drill 2.0.
> > > >>
> > > >>
> > > >>  *   JDBC/ODBC drivers
> > > >>  *   Client (for JDBC/ODBC) + ODBC & JDBC
> > > >>  *   Client (for full Drill async, columnar)
> > > >>  *   Storage plugin
> > > >>  *   Format plugin
> > > >>  *   System/session options
> > > >>  *   Queueing (e.g. ZK-based queues)
> > > >>  *   Rest API
> > > >>  *   Resource Planning (e.g. max query memory per node)
> > > >>  *   Metadata access, storage (e.g. file system locations vs. a
> > > metastore)
> > > >>  *   Metadata files formats (Parquet, views, etc.)
> > > >>
> > > >> Lower priority for future releases:
> > > >>
> > > >>
> > > >>  *   Query Planning (e.g. Calcite rules)
> > > >>  *   Config options
> > > >>  *   SQL syntax, especially Drill extensions
> > > >>  *   UDF
> > > >>  *   Management (e.g. JMX, Rest API calls, etc.)
> > > >>  *   Drill File System (HDFS)
> > > >>  *   Web UI
> > > >>  *   Shell scripts
> > > >>
> > > >> There are certainly more. Please suggest those that are missing.
> I’ve
> > > >> taken a rough cut at which APIs need forward/backward compatibility
> > > first,
> > > >> in part based on those that are the “most public” and most likely
to
> > > >> change. Others are important, but we can’t do them all at once.
> > > >>
> > > >> Thanks,
> > > >>
> > > >> - Paul
> > > >>
> > > >> On Aug 29, 2017, at 6:00 PM, Aman Sinha <amansinha@apache.org
> <mailto:
> > a
> > > >> mansinha@apache.org>> wrote:
> > > >>
> > > >> Hi Paul,
> > > >> certainly makes sense to have the API compatibility discussions
> during
> > > this
> > > >> hackathon.  The 2.0 release may be a good checkpoint to introduce
> > > breaking
> > > >> changes necessitating changes to the ODBC/JDBC drivers and other
> > > external
> > > >> applications. As part of this exercise (not during the hackathon but
> > as
> > > a
> > > >> follow-up action), we also should clearly identify the "public"
> > > interfaces.
> > > >>
> > > >>
> > > >> I will add this to the agenda.
> > > >>
> > > >> thanks,
> > > >> -Aman
> > > >>
> > > >> On Tue, Aug 29, 2017 at 2:08 PM, Paul Rogers <progers@mapr.com
> > <mailto:
> > > >> progers@mapr.com>> wrote:
> > > >>
> > > >> Thanks Aman for organizing the Hackathon!
> > > >>
> > > >> The list included many good ideas for Drill 2.0. Some of those
> require
> > > >> changes to Drill’s “public” interfaces (file format, client
> protocol,
> > > SQL
> > > >> behavior, etc.)
> > > >>
> > > >> At present, Drill has no good mechanism to handle backward/forward
> > > >> compatibility at the API level. Protobuf versioning certainly helps,
> > but
> > > >> can’t completely solve semantic changes (where a field changes
> > meaning,
> > > or
> > > >> a non-Protobuf data chunk changes format.) As just one concrete
> > example,
> > > >> changing to Arrow will break pre-Arrow ODBC/JDBC drivers because
> class
> > > >> names and data formats will change.
> > > >>
> > > >> Perhaps we can prioritize, for the proposed 2.0 release, a one-time
> > set
> > > of
> > > >> breaking changes that introduce a versioning mechanism into our
> public
> > > >> APIs. Once these are in place, we can evolve the APIs in the future
> by
> > > >> following the newly-created versioning protocol.
> > > >>
> > > >> Without such a mechanism, we cannot support old & new clients
in the
> > > same
> > > >> cluster. Nor can we support rolling upgrades. Of course, another
> > > solution
> > > >> is to get it right the second time, then freeze all APIs and agree
> to
> > > never
> > > >> again change them. Not sure we have sufficient access to a crystal
> > ball
> > > to
> > > >> predict everything we’d ever need in our APIs, however...
> > > >>
> > > >> Thanks,
> > > >>
> > > >> - Paul
> > > >>
> > > >> On Aug 24, 2017, at 8:39 AM, Aman Sinha <amansinha@apache.org
> <mailto:
> > a
> > > >> mansinha@apache.org>> wrote:
> > > >>
> > > >> Drill Developers,
> > > >>
> > > >> In order to kick-start the Drill 2.0  release discussions, I would
> > like
> > > >> to
> > > >> propose a Drill 2.0  (design) hackathon (a.k.a Drill Developer Day
> ™ J
> > > ).
> > > >>
> > > >> As I mentioned in the hangout on Tuesday,  MapR has offered to host
> it
> > > on
> > > >> Sept 18th at their offices at 350 Holger Way, San Jose.   Hope that
> > > works
> > > >> for most of you!
> > > >>
> > > >> The goal is to get the community together for a day-long technical
> > > >> discussion on key topics in preparation for a Drill 2.0 release as
> > well
> > > >> as
> > > >> potential improvements in upcoming 1.xx releases.  Depending on the
> > > >> interest areas, we could form groups and have a volunteer lead each
> > > >> group.
> > > >>
> > > >> Based on prior discussions on the dev list, hangouts and existing
> > JIRAs,
> > > >> there is already a substantial set of topics and I have summarized
a
> > few
> > > >> of
> > > >> them below.   What other topics do folks want to talk about?   Feel
> > free
> > > >> to
> > > >> respond to this thread and I will create a google doc to
> consolidate.
> > > >> Understandably, the list would be long but we will use the hackathon
> > to
> > > >> get
> > > >> a sense of a reasonable feature set for 1.xx and 2.0 releases.
> > > >>
> > > >>
> > > >> 1. Metadata management.
> > > >>
> > > >> 1a: Defining an abstraction layer for various types of metadata:
> > views,
> > > >> schema, statistics, security
> > > >>
> > > >> 1b: Underlying storage for metadata: what are the options and their
> > > >> trade-offs?
> > > >>
> > > >>    - Hive metastore
> > > >>
> > > >>    - Parquet metadata cache (parquet specific)
> > > >>
> > > >>    - An embedded DBMS
> > > >>
> > > >>    - A distributed key-value store
> > > >>
> > > >>    - Others..
> > > >>
> > > >>
> > > >>
> > > >> 2. Drill integration with Apache Arrow
> > > >>
> > > >> 2a: Evaluate the choices and tradeoffs
> > > >>
> > > >>
> > > >>
> > > >> 3. Resource management
> > > >>
> > > >> 3a: Memory limits per query
> > > >>
> > > >> 3b: Spilling
> > > >>
> > > >> 3c: Resource management with Drill on Yarn/Mesos/Kubernetes
> > > >>
> > > >> 3d: Local vs. global resource management
> > > >>
> > > >> 3e: Aligning with admission control/queueing
> > > >>
> > > >>
> > > >>
> > > >> 4. TPC-DS coverage and related planner/operator enhancements
> > > >>
> > > >> 4a: Additional set operations: INTERSECT, EXCEPT
> > > >>
> > > >> 4b: GROUPING SETS, ROLLUP, CUBE support
> > > >>
> > > >> 4c: Handling inequality joins and cartesian joins of non-scalar
> inputs
> > > >> (via Nested Loop Join)
> > > >>
> > > >> 4d: Remaining gaps in correlated subquery
> > > >>
> > > >> 4e: Statistics: Number of Distinct Values, Histograms
> > > >>
> > > >>
> > > >>
> > > >> 5. Schema handling
> > > >>
> > > >> 5a: Creation, management of schema
> > > >>
> > > >> 5b: Handling schema changes in certain common cases
> > > >>
> > > >> 5c: Schema-awareness
> > > >>
> > > >> 5d: Others TBD
> > > >>
> > > >>
> > > >>
> > > >> 6. Concurrency
> > > >>
> > > >> 6a: What are the bottlenecks to achieving higher concurrency
> > > >>
> > > >> 6b: Ideas to address these..e.g async execution ?
> > > >>
> > > >>
> > > >>
> > > >> 7. Storage plugins,  REST APIs related enhancements
> > > >>
> > > >>  <Topics TBD>
> > > >>
> > > >>
> > > >>
> > > >> 8. Performance improvements
> > > >>
> > > >> 8a: Filter pushdown
> > > >>
> > > >> 8b: Vectorized Parquet reader
> > > >>
> > > >> 8c: Code-gen improvements
> > > >>
> > > >> 8d: Others TBD
> > > >>
> > > >>
> > > >>
> > > >>
> > > >
> > > >
> > >
> > >
> >
>
-- 
Thanks & Regards,
B Anil Kumar.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message