incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <>
Subject Re: [DISCUSS] Druid incubation proposal
Date Fri, 16 Feb 2018 22:17:03 GMT
As Champion for this proposal, let me say that the Druid project will be an excellent addition
to the ASF. I have been an observer of the project for a couple of years, and in many respects
it is already operating in the Apache Way. Druid had paid developers from a number of companies,
some of whom were in competition, and its governance was strong enough to navigate the choppy
waters that that can create.

A number of Druid committers subsequently started to work on Apache projects (Gian on Calcite,
and Slim and Nishant on Hive) and so already know what to expect. 

You can get a sense of the project dynamic by reading the archives of their dev list:!forum/druid-development


> On Feb 16, 2018, at 12:15 PM, Gian Merlino <> wrote:
> Hi all,
> I would like to open up a discussion about incubating Druid at Apache. I've
> included a proposal in this mail and have also posted a draft at
> More information about
> Druid is also available on our project web site at:
> Thanks for your consideration!
> Gian
> = Druid Proposal =
> == Abstract ==
> Druid is a high-performance, column-oriented, distributed data store.
> == Proposal ==
> Druid is an open source data store designed for real-time exploratory
> analytics on large data sets. Druid's key features are a column-oriented
> storage layout, a distributed shared-nothing architecture, and ability to
> generate and leverage indexing and caching structures. Druid is typically
> deployed in clusters of tens to hundreds of nodes, and has the ability to
> load data from Apache Kafka and Apache Hadoop, among other data sources.
> Druid offers two query languages: a SQL dialect (powered by Apache Calcite)
> and a JSON-over-HTTP API.
> Druid was originally developed to power a slice-and-dice analytical UI
> built on top of large event streams. The original use case for Druid
> targeted ingest rates of millions of records/sec, retention of over a year
> of data, and query latencies of sub-second to a few seconds. Many people
> can benefit from such capability, and many already have (see
> In addition, new use cases have
> emerged since Druid's original development, such as OLAP acceleration of
> data warehouse tables and more highly concurrent applications operating
> with relatively narrower queries.
> == Background ==
> Druid is a data store designed for fast analytics. It would typically be
> used in lieu of more general purpose query systems like Hadoop !MapReduce
> or Spark when query latency is of the utmost importance. Druid is often
> used as a data store for powering GUI analytical applications.
> The buzzwordy description of Druid is a high-performance, column-oriented,
> distributed data store. What we mean by this is:
> * "high performance": Druid aims to provide low query latency and high
> ingest rates possible.
> * "column-oriented": Druid stores data in a column-oriented format, like
> most other systems designed for analytics. It can also store indexes along
> with the columns.
> * "distributed": Druid is deployed in clusters, typically of tens to
> hundreds of nodes.
> * "data store": Druid loads your data and stores a copy of it on the
> cluster's local disks (and may cache it in memory). It doesn't query your
> data from some other storage system.
> == Rationale ==
> Druid is a mature, active project with a large number of production
> installations, dozens of contributors to each release, and multiple vendors
> offering professional support. Given Druid's strong community, its close
> integration with many other Apache projects (such as Kafka, Hadoop, and
> Calcite), and its pre-existing Apache-inspired governance structure, we
> feel that Apache is the best home for the project on a long-term basis.
> == Current Status ==
> === Meritocracy ===
> Since Druid was first open sourced the original developers have solicited
> contributions from others, including through our blog, the project mailing
> lists, and through accepting !GitHub pull requests. We have an
> Apache-inspired governance structure with a PMC and committers, and our
> committer ranks include a good number of people from outside the original
> development team.
> === Community ===
> The Druid core developers have sought to nurture a community throughout the
> life of the project. We use !GitHub as the focal point for bug reports and
> code contributions, and the mailing lists for most other discussion. To try
> to make people feel welcome, we've also spelled this out on a "CONTRIBUTE"
> link from the project page: Today we have an
> active contributor base (a typical release has ~40 contributors) and
> mailing list.
> === Core Developers ===
> Druid enjoys good diversity of committer affiliation. The most active
> developers over the past year are affiliated with four different companies:
> Imply, Metamarkets, Yahoo, and Hortonworks. Many Druid committers are also
> committers on other ASF projects as well, including Apache Airflow, Apache
> Curator, and Apache Calcite. The original developers of Druid remain
> involved in the project.
> === Alignment ===
> Druid's current governance structure is Apache-inspired with a PMC and
> committers chosen by a meritocratic process. Additionally, Druid integrates
> with a number of other Apache projects, including Kafka, Hadoop, Hive,
> Calcite, Superset (incubating), Spark, Curator, and !ZooKeeper.
> == Known Risks ==
> === Orphaned products ===
> The risk of Druid becoming orphaned is low, due to a diverse committer base
> that is invested in the future of the project.
> === Inexperience with Open Source ===
> Druid's core developers have been running it as a community-oriented open
> source project for some time now, and many of them are committers on other
> open source projects as well, including Apache Airflow, Apache Curator, and
> Apache Calcite.
> === Homogenous Developers ===
> Druid's current diversity of committer affiliation means that we have
> become accustomed to working collaboratively and in the open. We hope that
> a transition to the ASF helps Druid's contributor base become even more
> diverse.
> === Reliance on Salaried Developers ===
> Druid's user base and contributor base skews heavily towards salaried
> developers. We believe this is natural since Druid is a technology designed
> to be deployed on large clusters, and due to this, tends to be deployed by
> organizations rather than by individuals. Nevertheless, many current Druid
> developers have continued working on the project even through job changes,
> which we take to be a good sign of developer commitment and personal
> interest.
> === Relationships with Other Apache Products ===
> Druid integrates with a number of other Apache projects. Druid internally
> uses Calcite for SQL planning, and Curator and !ZooKeeper for coordination.
> Druid can read data in Avro or Parquet format. Druid can load data from
> streams in Kafka or from files in Hadoop. Druid integrates with Hive as an
> option for SQL query acceleration. Druid data can be visualized by Superset
> (incubating).
> === A Excessive Fascination with the Apache Brand ===
> Druid is a successful project with a diverse community. The main reason for
> pursuing incubation is to find a stable, long term home for the project
> with a well known governance philosophy.
> == Required Resources ==
> === Mailing lists ===
> We would like to migrate the existing Druid mailing lists from Google
> Groups to Apache.
> * druid-user@googlegroups ->
> * druid-development@googlegroups ->
> === Source control ===
> Druid development currently takes place on !GitHub. We would like to
> continue using !GitHub, if possible, in order to preserve the workflows the
> community has developed around !GitHub pull requests.
> === Issue tracking ===
> Druid currently uses !GitHub issues for issue tracking. We would like to
> migrate to Apache JIRA at
> == Documentation ==
> Druid's documentation can be found at
> == Initial Source ==
> Druid was initially open-sourced by Metamarkets in 2012 and has been run in
> a community-governed fashion since then. The code is currently hosted at
> and includes the following repositories:
> * druid (primary repository)
> * druid-console (web console for Druid)
> * (source for Druid's website at
> * tranquility (realtime stream push client for Druid)
> * docker-druid (Docker image for Druid)
> * pydruid (Python library)
> * RDruid (R library)
> * oss-parent (Maven POM files)
> == Source and Intellectual Property Submission Plan ==
> A complete set of the open source code needs to be licensed from the owning
> organization to the Foundation. Commercial legal counsel for the owning
> organization will review the standard Foundation licensing paperwork and
> propose any updates as needed. This license will enable Apache to incubate
> and manage the Druid project moving forward.
> Other Druid paraphernalia to be transferred to Apache consists of:
> * !GitHub organization at
> * Twitter account at
> * "" domain name
> * "Druid" trademark assignment per Foundation standard paper.  The
> trademark assignment paperwork shall be reviewed by the owning
> organization's commercial and IP counsel
> * CLAs - all rights in the code licensed above should encompass the CLAs
> that existed between developers and owning organization
> A copyright license to the code, trademark assignment of Druid, and
> transfer of other paraphernalia to Apache should be sufficient to cover all
> rights required by Apache to operate the project.
> == External Dependencies ==
> External dependencies distributed with Druid currently all have one of the
> following Category A or B licenses: ASL, BSD, CDDL, EPL, MIT, MPL; with one
> exception: the optional Druid MySQL metadata store extension depends on
> MySQL Connector/J, which is GPL licensed. Druid currently packages this as
> a separate download; see our current presentation on:
> As part of incubation we intend to
> determine the best strategy for handling the MySQL extension.
> == Cryptography ==
> Not applicable.
> == Initial Committers ==
> The initial committers for incubation are the current set of committers on
> Druid who have expressed interest in being involved in Apache incubation.
> Affiliations are listed where relevant. We may seek to add other committers
> during incubation; for example, we would want to add any current Druid
> committers who express an interest after incubation begins.
> * Charles Allen ( (Snap)
> * David Lim ( (Imply)
> * Eric Tschetter ( (Splunk)
> * Fangjin Yang ( (Imply)
> * Gian Merlino ( (Imply)
> * Himanshu Gupta ( (Oath)
> * Jihoon Son ( (Imply)
> * Jonathan Wei ( (Imply)
> * Maxime Beauchemin ( (Lyft)
> * Mohamed Slim Bouguerra ( (Hortonworks)
> * Nishant Bangarwa ( (Hortonworks)
> * Parag Jain ( (Oath)
> * Roman Leventov ( (Metamarkets)
> * Xavier Léauté ( (Confluent)
> == Sponsors ==
> * Champion: Julian Hyde
> * Nominated mentors: Julian Hyde, P. Taylor Goetz, Jun Rao
> * Sponsoring entity: Apache Incubator

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message