incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxime Beauchemin <maximebeauche...@gmail.com>
Subject Re: [PROPOSAL] Superset Proposal for Apache Incubator
Date Tue, 25 Apr 2017 06:32:56 GMT
"Airbnb will submit a Software Grant Agreement (SGA) as Superset joins the
incubator."

Should I add this sentence in the proposal?

Max

On Mon, Apr 24, 2017 at 5:48 AM, John D. Ament <johndament@apache.org>
wrote:

> I missed this discussion.  In your IP section, you list out:
>
> == Source and Intellectual Property Submission Plan ==
> We do not expect any complications for the submission of the Superset code
> base.  Our code is already in Github and there is only a single code base.
>
> This IMHO not clear.  Does Airbnb plan to submit a SGA for Superset, or
> expect that no SGA is required because its Apache licensed?
>
> John
>
> On Sun, Apr 2, 2017 at 4:09 PM Jeff Feng <jeff.feng@airbnb.com.invalid>
> wrote:
>
> > Dear Apache Incubator Community,
> >
> > We are excited to share our proposal for discussion and feedback for
> > entering Apache Incubation.  Superset is an enterprise-ready web
> > application for data exploration, data visualization and dashboarding.
> >
> > Our Incubation proposal is at the following Wiki as well as copied in the
> > email below:
> >
> > https://wiki.apache.org/incubator/SupersetProposal
> >
> > We have an active Superset community including 400+ members and nearly
> 200
> > topics.  The Google Group can be found below.  We plan to move the
> > discussion to the ASF:
> >
> > https://groups.google.com/forum/#!forum/airbnb_superset
> >
> > Thank you and look forward to the discussion!
> >
> > Jeff, Max & Alanna
> >
> >
> >
> >
> > = Superset =
> >
> > == Abstract ==
> >
> > Superset is an enterprise-ready web application for data exploration,
> data
> > visualization and dashboarding.
> >
> > == Proposal ==
> >
> > Superset is business intelligence (BI) software that helps modern
> > organizations visualize and interact with their data. Superset enables
> > users explore data from a variety of databases, assemble beautiful
> > dashboards and share their findings.  Superset works neatly with all
> modern
> > SQL-speaking databases, and integrates with Druid.io to provide
> real-time,
> > interactive, blazing fast data access to large datasets.
> >
> > == Background ==
> >
> > Data is mission critical. To succeed in this era, organizations need to
> > provide low-friction, intuitive and interactive access to data. It is
> > paramount for knowledge workers to be capable of answering their own
> > questions by querying, exploring and visualizing data.
> >
> > The entire business intelligence industry has pivoted from a model of
> > centralized top-down platforms driven by IT organizations to self-service
> > analytics and agile workflows by any user.  This shift unblocks
> centralized
> > service bottlenecks for creating data visualizations while also creating
> an
> > environment that is iterative and fast-moving.  This means that business
> > intelligence software must also be easy and delightful to use.
> > Self-service analytics doesn’t mean that admin and governance features
> are
> > not needed.
> >
> > Modern BI tools provide fine-grain access controls and auditing
> > capabilities to understand how data is being used.  Superset is a
> solution
> > that delivers on all of these vectors.
> >
> > The technology stack is also constantly morphing - vendors are struggling
> > to provide cheap, quick and easy solutions to access data.  Business
> > intelligence users are finding existing solutions lacking as these
> software
> > products either disregard or react slowly to recent game-changing
> > technologies like Druid.io, PrestoDB, Apache Drill, Apache Kylin, d3.js,
> > React.js and iPython’s Jupyter for instance.
> >
> > == Rationale ==
> >
> > Business intelligence is more relevant today than at any other point in
> > history.  Organizations are currently very limited in options for open
> > source data visualization solutions, especially solutions that are both
> > self-service and enterprise-ready.  Every company informing their
> decisions
> > with data needs a BI tool.
> >
> > We believe that Superset will be a strong compliment to existing Apache
> > Software Foundation technologies by offering scalable user interactions
> to
> > distributed storage and computation solutions.  Users will often find
> that
> > Superset can act as a catalyst for tooling that can visualize the
> byproduct
> > of data and computation infrastructure.
> >
> > Superset has many key design elements that help fill a gap in current
> > solutions for organizations:
> >
> > * Easy, low friction access to data through a simple, web-based data
> > exploration interface.  Composing charts and dashboards are intuitive.
> > Eliminating the need to write code or SQL empowers anyone to use it.
> >
> > * Access to a wide array of rich, interactive data visualization types.
> >
> > * Enterprise-ready: Integration with different authentication mechanisms
> > and granular permissions centered around actions and data access.
> >
> > * Realtime & fast: Superset provides realtime analytics at the speed of
> > thought on very large datasets when integrated with Druid.io.
> >
> > * Broad data access: Consume data out of any SQL-speaking relational
> > database.
> >
> > * Extensible: Can be extended to talk to many noSQL databases like Apache
> > Drill, Elastic Search, and other popular database engines.
> >
> > * Fast loading dashboards with configurable web-scale caching.
> >
> > * Plug-in framework that enables organizations to build custom analytical
> > applications with new UI/UX interfaces.
> >
> > * SQL Lab, a state-of-the-art SQL IDE that empowers SQL-speaking users
> with
> > more flexibility.  SQL Lab integrates with the visualization engine
> > seamlessly.
> >
> > == Initial Goals ==
> >
> > The initial goals of the Superset project are several-fold:
> >
> > Move the existing codebase to Apache and integrate with the Apache
> > development process.
> >
> > Redesign the user interface and interaction model for creating
> > visualizations/dashboards and connecting to data sources
> >
> > Build robust support for security and governance of the tool including
> > popular authorization modules (including Apache Ranger and Apache Sentry)
> > and a more sophisticated permissions system
> >
> > Grow the extensibility of the project both in terms of enhanced
> > connectivity to NoSQL-based data sources and creating a plug-in framework
> > that enables organizations to build custom analytical applications which
> > require a new UI/UX
> >
> > == Current Status ==
> >
> > By many standards, Superset is already a successful open source project.
> As
> > of March 2017, Superset is officially used in production at about a dozen
> > companies, has received contributions from over one hundred contributors
> on
> > Github, 1500+ forks, and 12k+ stars.
> >
> > Sizeable companies like Airbnb, Yahoo! and Hortonworks have made
> > significant contributions, and expressed their commitment to the project.
> > The product is feature complete and has been viable for months. It
> already
> > serves as the main interface for consuming data at many companies of
> > different sizes.
> >
> > While the product is usable, there’s room for improvement across the
> board,
> > starting with providing a smoother user experience around content
> creation,
> > making sure all features work out-of-the-box on more platforms and
> > databases, providing better user training guides and videos, having a
> > predictable release process, and increasing the overall quality of the
> > Superset releases.
> >
> > === Meritocracy ===
> >
> > We plan to invest in supporting a meritocracy. We will discuss the
> > requirements in an open forum. Several companies have expressed interest
> in
> > this project, and we intend to invite additional developers to
> participate.
> > We will encourage and monitor community participation so that privileges
> > can be extended to those that contribute.
> >
> > === Community ===
> >
> > The need for an enterprise-ready data visualization and exploration
> > platform in the open source community is tremendous.  While Superset is
> > fairly well known, recognized and used within the Druid.io community,
> > adoption is currently limited outside of that niche. There is a huge
> > opportunity to grow the community to hundreds if not thousands of
> > organizations, and we are hoping that embracing “the Apache way” will
> > accelerate the growth of our community.
> >
> > We have already been active at seeking and inviting contributions, and
> are
> > planning to scale the project by investing time and growing the support
> > structure to grow the community.
> >
> > === Core Developers ===
> >
> > The initial committers for Superset include experienced full stack,
> > front-end and data engineers:
> >
> > * Maxime Beauchemin (Airbnb)
> >
> > * Alanna Scott (Airbnb)
> >
> > * Bogdan Kyryliuk (Airbnb)
> >
> > * Vera Liu  (Airbnb)
> >
> > * Jeff Feng (Airbnb)
> >
> > * Ashutosh Chauhan (Hortonworks)
> >
> > * Nishant Bangarwa (Hortonworks)
> >
> > * Slim Bouguerra (Hortonworks)
> >
> > * Priyank Shah (Hortonworks)
> >
> > * Sriharsha Chintalapani (Hortonworks)
> >
> > * Daniel Dai (Hortonworks)
> >
> > We realize that additional employer diversity is needed, and we will work
> > aggressively to recruit developers from additional companies.
> >
> > === Alignment ===
> >
> > The initial committers strongly believe that a system for interactive
> > visualization of data will gain broader adoption as an open source,
> > community driven project, where the community can contribute not only to
> > the core components, but also to a growing collection of connectors,
> > visualizations and improving integration a all potential data sources.
> > Superset already integrates closely with Apache Hive, the Hive metastore,
> > as well as most SQL-speaking databases found in modern data ecosystems.
> >
> > == Known Risks ==
> >
> > === Orphaned Products ===
> >
> > Superset is a vital component for both visualizing, accessing and
> > democratizing data at Airbnb.  Also at Hortonworks, Superset is a core
> > component of the DataFlow product offering.  Thus, the risk of the
> project
> > being orphaned is relatively low.  The project could be at risk if Airbnb
> > changes their approach for democratizing data or if Hortonworks changes
> > their strategy in the market.  In such an event, the committers plan to
> > continue working on the project on their own time, thought the progress
> > will likely be slower.  We plan to mitigate this risk by recruiting
> > additional committers.
> >
> > === Inexperience with Open Source ===
> >
> > The initial committers include veteran Apache members (committers and PMC
> > members) and other developers who have varying degrees of experience with
> > open source projects. All have been involved with source code that has
> been
> > released under an open source license, and several also have experience
> > developing code with an open source development process.
> >
> > === Homogenous Developers ===
> >
> > The initial committers are employed by Airbnb Inc., and Hortonworks. We
> are
> > committed to recruiting additional committers from other companies.
> >
> > === Reliance on Salaried Developers ===
> >
> > It is expected that Superset development will occur on both salaried time
> > and on volunteer time, after hours. The majority of initial committers
> are
> > paid by their employer to contribute to this project. However, they are
> all
> > passionate about the project, and we are confident that the project will
> > continue even if no salaried developers contribute to the project. We are
> > committed to recruiting additional committers including non-salaried
> > developers.
> >
> > === Relationships with Other Apache Products ===
> >
> > To the knowledge of the Initial Committers, there are no direct
> competitors
> > to Superset within the Apache Software Foundation.  That said, Apache
> > Zeppelin is an indirect competitor, but it solves a different use case.
> >
> > Apache Zeppelin is a web-based notebook that enables interactive data
> > analytics. It enables the creation of beautiful data-driven, interactive
> > and collaborative documents with SQL, Scala and more.  Although a user
> can
> > create data visualizations using this project, it leverages a notebook
> > style user interfaces and it is geared towards the Spark community where
> > Scala and SQL co-exist
> >
> > We look forward to collaborating with those communities, as well as other
> > Apache communities.
> >
> > === An Excessive Fascination with the Apache Brand ===
> >
> > Superset is solving two huge challenges:
> >
> > The challenge of enabling every knowledge worker to make data informed
> > decisions, particularly those who are not deeply skilled at writing SQL.
> >
> > The challenge of visualizing huge amounts of data interactively and in
> > real-time
> >
> > Superset was first developed as a data visualization solution for
> Druid.io
> > as a way to visualize billions of rows of data.  Since then, usage of
> > Superset has expanded to address data visualization use cases across SQL
> > speaking data sources as well.
> >
> > Our rationale for developing Superset as an Apache project is detailed in
> > the Rationale Section.  We believe that the Apache brand and community
> > process will help us attract more contributors to this project, and help
> > grow the footprint of the project through usage at other organizations
> and
> > within other applications.  Establishing consensus among users and
> > developers will result in a more valuable tool for everyone.
> >
> > == Documentation ==
> >
> > References to further reading material:
> >
> > * [[http://airbnb.io/superset/|Superset Documentation]]
> >
> > * [[https://medium.com/airbnb-engineering/caravel-airbnb-s-dat
> > a-exploration-platform-15a72aa610e5#.npqmmbu25|Blog Post:  Superset:
> > Airbnb’s Data Exploration Platform]]
> >
> > * [[https://medium.com/airbnb-engineering/superset-scaling-dat
> > a-access-and-visual-insights-at-airbnb-3ce3e9b88a7f#.a505zvb1t|Blog
> Post:
> >  Superset: Scaling Data Access & Visual Insights at Airbnb]]
> >
> > == Initial Source ==
> >
> > The origin of the proposed code base can be found at
> > https://github.com/airbnb/superset.  The code base is primarily in
> Python.
> >
> > == Source and Intellectual Property Submission Plan ==
> >
> > We do not expect any complications for the submission of the Superset
> code
> > base.  Our code is already in Github and there is only a single code
> base.
> >
> > == External Dependencies ==
> >
> > List of Python packages, from the Python Package Index (Pypi):
> >
> > * boto3
> >
> > * celery
> >
> > * cryptography
> >
> > * flask-appbuilder
> >
> > * flask-cache
> >
> > * flask-migrate
> >
> > * flask-script
> >
> > * flask-sqlalchemy
> >
> > * flask-testing
> >
> > * humanize
> >
> > * gunicorn
> >
> > * markdown
> >
> > * pandas
> >
> > * parsedatetime
> >
> > * pydruid
> >
> > * PyHive
> >
> > * python-dateutil
> >
> > * requests
> >
> > * simplejson
> >
> > * six
> >
> > * sqlalchemy
> >
> > * sqlalchemy-utils
> >
> > * sqlparse
> >
> > * thrift
> >
> > * thrift-sasl
> >
> > * werkzeug
> >
> > List of Javascript packages, from NPM:
> >
> > * autobind-decorator
> >
> > * bootstrap
> >
> > * bootstrap-datepicker
> >
> > * brace
> >
> > * brfs
> >
> > * cal-heatmap
> >
> > * classnames
> >
> > * d3
> >
> > * d3-cloud
> >
> > * d3-sankey
> >
> > * d3-scale
> >
> > * d3-tip
> >
> > * datamaps
> >
> > * datatables-bootstrap3-plugin
> >
> > * datatables.net-bs
> >
> > * font-awesome
> >
> > * gridster
> >
> > * immutability-helper
> >
> > * immutable
> >
> > * jquery
> >
> > * lodash.throttle
> >
> > * mapbox-gl
> >
> > * moment
> >
> > * moments
> >
> > * mustache
> >
> > * nvd3
> >
> > * react
> >
> > * react-ace
> >
> > * react-bootstrap
> >
> > * react-bootstrap-table
> >
> > * react-dom
> >
> > * react-draggable
> >
> > * react-gravatar
> >
> > * react-grid-layout
> >
> > * react-map-gl
> >
> > * react-redux
> >
> > * react-resizable
> >
> > * react-select
> >
> > * react-syntax-highlighter
> >
> > * reactable
> >
> > * redux
> >
> > * redux-localstorage
> >
> > * redux-thunk
> >
> > * shortid
> >
> > * style-loader
> >
> > * supercluster
> >
> > * topojson
> >
> > * victory
> >
> > * viewport-mercator-project
> >
> > == Cryptography ==
> >
> > The proposal does not include cryptographic code.
> >
> > == Required Resources ==
> >
> > === Mailing List ===
> >
> > There is a current mailing list as a Google Group “airbnb_superset” that
> we
> > are planning on deprecating as the Apache.org become ready to serve our
> > community.
> >
> > * superset-private
> >
> > * superset-dev
> >
> > * superset-user
> >
> > === Subversion Directory ===
> >
> > Git is the preferred source control system.
> http://svn.apache.org/repos/as
> > f/incubator/superset <http://svn.apache.org/repos/asf/incubator/superset
> >
> >
> > == Git Repository ==
> >
> > Git is the preferred source control system, we’re assuming
> > https://github.com/apache/incubator-superset based on the naming scheme
> >
> > == Issue Tracking ==
> >
> > JIRA Superset (SUPERSET). If possible, we’d like to use Github issues &
> PRs
> > to manage our project as much as possible. It’s been said that there are
> > ways to keep Github’s issues in sync with Jira, allowing us to get best
> of
> > both worlds. If that is not possible, we will comply to using Jira.
> >
> > == Other Resources ==
> >
> > We currently use a set of Github integrated services that are free to the
> > open source community, like Travis-ci, Code Climate, Coveralls,
> > Landscape.io, Requires.io, david-dm and Gitter. We would like to keep
> using
> > these services as they allow us to scale contributions and optimize our
> > development flows. These services require some elevated rights on the
> > Github repository in order to set up or tune and we would like for the
> > committers to have the required rights.
> >
> >
> > == Initial Committers ==
> >
> > * Maxime Beauchemin <maxime.beauchemin@airbnb.com> - PMC & Committer
> >
> > * Alanna Scott <alanna.scott@airbnb.com> - PMC & Committer
> >
> > * Bogdan Kyryliuk <b.kyryliuk@gmail.com> - PMC & Committer
> >
> > * Vera Liu <vera.liu@airbnb.com> - Committer
> >
> > * Jeff Feng <jeff.feng@airbnb.com> - PMC & Committer
> >
> > * Ashutosh Chauhan <hashutosh@apache.org> - Mentor & Committer
> >
> > * Nishant Bangarwa <nbangarwa@hortonworks.com> - PMC & Committer
> >
> > * Slim Bouguerra <sbouguerra@hortonworks.com> - Committer
> >
> > * Priyank Shah <pshah@hortonworks.com> - Committer
> >
> > * Harsha Chintalapani <schintalapani@hortonworks.com> - Committer
> >
> > * Daniel Dai <daijy@apache.org> - Champion & Committer
> >
> > == Affiliations ==
> >
> > The initial committers are employees of Airbnb Inc. and Hortonworks.
> >
> > == Sponsors ==
> >
> > === Champion ===
> >
> > Daniel Dai <daijy@apache.org>
> >
> > === Nominated Mentors ===
> >
> > Ashutosh Chauhan <hashutosh@apache.org>
> >
> > === Sponsoring Entity ===
> >
> > Incubator PMC
> >
> >
> > --
> >
> > *Jeff Feng*
> > Product Manager
> > m: (949)-610-5108 <(949)%20610-5108> <(949)%20610-5108>
> > twitter: @jtfeng
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message