incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: [VOTE] Superset Proposal for Apache Incubator
Date Wed, 26 Apr 2017 23:36:26 GMT
+1 (binding)



On Tue, Apr 25, 2017 at 1:58 PM, Joe Witt <joe.witt@gmail.com> wrote:

> +1 (binding)
>
> On Tue, Apr 25, 2017 at 4:52 PM, Jitendra Pandey
> <jitendra@hortonworks.com> wrote:
> > +1 (binding)
> >
> > On 4/25/17, 1:27 PM, "Julian Hyde" <jhyde@apache.org> wrote:
> >
> >     +1 binding
> >
> >     > On Apr 25, 2017, at 12:48 PM, moon soo Lee <moon@apache.org>
> wrote:
> >     >
> >     > +1 (non-binding)
> >     >
> >     > On Tue, Apr 25, 2017 at 11:49 AM Ashutosh Chauhan <
> hashutosh@apache.org>
> >     > wrote:
> >     >
> >     >> +1 (binding)
> >     >>
> >     >> Thanks,
> >     >> Ashutosh
> >     >>
> >     >> On Mon, Apr 24, 2017 at 5:45 AM, Luke Han <luke.hq@gmail.com>
> wrote:
> >     >>
> >     >>> +1 binding
> >     >>>
> >     >>> Love to see Superset to be new incubator project.
> >     >>>
> >     >>>
> >     >>> Best Regards!
> >     >>> ---------------------
> >     >>>
> >     >>> Luke Han
> >     >>>
> >     >>> On Sun, Apr 23, 2017 at 10:53 PM, Jeff Feng <jeff.feng@gmail.com>
> wrote:
> >     >>>
> >     >>>> Dear Apache Incubator Community,
> >     >>>>
> >     >>>> We have updated the Superset proposal
> >     >>>> <https://wiki.apache.org/incubator/SupersetProposal>
(copied
> below) for
> >     >>>>
> >     >>>> Apache Incubation with an additional mentor (Luke Han -
> >     >>>> luke.han@apache.org),
> >     >>>> and would like to start a vote thread for acceptance into the
> incubator.
> >     >>>>
> >     >>>> Our team is excited to share Superset with the Apache community
> and we
> >     >>>> hope
> >     >>>> for the your continued support!
> >     >>>>
> >     >>>> Cheers,
> >     >>>> Jeff & the Superset Team
> >     >>>>
> >     >>>>
> >     >>>>
> >     >>>>
> >     >>>> = Superset =
> >     >>>>
> >     >>>> == Abstract ==
> >     >>>> Superset is an enterprise-ready web application for data
> exploration,
> >     >> data
> >     >>>> visualization and dashboarding.
> >     >>>>
> >     >>>> == Proposal ==
> >     >>>> Superset is business intelligence (BI) software that helps
> modern
> >     >>>> organizations visualize and interact with their data. Superset
> enables
> >     >>>> users explore data from a variety of databases, assemble
> beautiful
> >     >>>> dashboards and share their findings.  Superset works neatly
> with all
> >     >>>> modern
> >     >>>> SQL-speaking databases, and integrates with Druid.io to provide
> >     >> real-time,
> >     >>>> interactive, blazing fast data access to large datasets.
> >     >>>>
> >     >>>> == Background ==
> >     >>>> Data is mission critical. To succeed in this era, organizations
> need to
> >     >>>> provide low-friction, intuitive and interactive access to data.
> It is
> >     >>>> paramount for knowledge workers to be capable of answering
> their own
> >     >>>> questions by querying, exploring and visualizing data.
> >     >>>>
> >     >>>> The entire business intelligence industry has pivoted from
a
> model of
> >     >>>> centralized top-down platforms driven by IT organizations to
> >     >> self-service
> >     >>>> analytics and agile workflows by any user.  This shift unblocks
> >     >>>> centralized
> >     >>>> service bottlenecks for creating data visualizations while
also
> creating
> >     >>>> an
> >     >>>> environment that is iterative and fast-moving.  This means
that
> business
> >     >>>> intelligence software must also be easy and delightful to use.
> >     >>>> Self-service analytics doesn’t mean that admin and governance
> features
> >     >> are
> >     >>>> not needed.
> >     >>>> Modern BI tools provide fine-grain access controls and auditing
> >     >>>> capabilities to understand how data is being used.  Superset
is
> a
> >     >> solution
> >     >>>> that delivers on all of these vectors.
> >     >>>>
> >     >>>> The technology stack is also constantly morphing - vendors
are
> >     >> struggling
> >     >>>> to provide cheap, quick and easy solutions to access data.
> Business
> >     >>>> intelligence users are finding existing solutions lacking as
> these
> >     >>>> software
> >     >>>> products either disregard or react slowly to recent
> game-changing
> >     >>>> technologies like Druid.io, PrestoDB, Apache Drill, Apache
> Kylin, d3.js,
> >     >>>> React.js and iPython’s Jupyter for instance.
> >     >>>>
> >     >>>> == Rationale ==
> >     >>>> Business intelligence is more relevant today than at any other
> point in
> >     >>>> history.  Organizations are currently very limited in options
> for open
> >     >>>> source data visualization solutions, especially solutions that
> are both
> >     >>>> self-service and enterprise-ready.  Every company informing
> their
> >     >>>> decisions
> >     >>>> with data needs a BI tool.
> >     >>>>
> >     >>>> We believe that Superset will be a strong compliment to
> existing Apache
> >     >>>> Software Foundation technologies by offering scalable user
> interactions
> >     >> to
> >     >>>> distributed storage and computation solutions.  Users will
> often find
> >     >> that
> >     >>>> Superset can act as a catalyst for tooling that can visualize
> the
> >     >>>> byproduct
> >     >>>> of data and computation infrastructure.
> >     >>>>
> >     >>>> Superset has many key design elements that help fill a gap
in
> current
> >     >>>> solutions for organizations:
> >     >>>> * Easy, low friction access to data through a simple, web-based
> data
> >     >>>> exploration interface.  Composing charts and dashboards are
> intuitive.
> >     >>>> Eliminating the need to write code or SQL empowers anyone to
> use it.
> >     >>>> * Access to a wide array of rich, interactive data
> visualization types.
> >     >>>> * Enterprise-ready: Integration with different authentication
> >     >> mechanisms
> >     >>>> and granular permissions centered around actions and data
> access.
> >     >>>> * Realtime & fast: Superset provides realtime analytics
at the
> speed of
> >     >>>> thought on very large datasets when integrated with Druid.io.
> >     >>>> * Broad data access: Consume data out of any SQL-speaking
> relational
> >     >>>> database.
> >     >>>> * Extensible: Can be extended to talk to many noSQL databases
> like
> >     >> Apache
> >     >>>> Drill, Elastic Search, and other popular database engines.
> >     >>>> * Fast loading dashboards with configurable web-scale caching.
> >     >>>> * Plug-in framework that enables organizations to build custom
> >     >> analytical
> >     >>>> applications with new UI/UX interfaces.
> >     >>>> * SQL Lab, a state-of-the-art SQL IDE that empowers
> SQL-speaking users
> >     >>>> with more flexibility.  SQL Lab integrates with the
> visualization engine
> >     >>>> seamlessly.
> >     >>>>
> >     >>>> == Initial Goals ==
> >     >>>> The initial goals of the Superset project are several-fold:
> >     >>>> * Move the existing codebase to Apache and integrate with the
> Apache
> >     >>>> development process.
> >     >>>> * Redesign the user interface and interaction model for creating
> >     >>>> visualizations/dashboards and connecting to data sources
> >     >>>> * Build robust support for security and governance of the tool
> >     >> including
> >     >>>> popular authorization modules (including Apache Ranger and
> Apache
> >     >> Sentry)
> >     >>>> and a more sophisticated permissions system
> >     >>>> * Grow the extensibility of the project both in terms of
> enhanced
> >     >>>> connectivity to NoSQL-based data sources and creating a plug-in
> >     >> framework
> >     >>>> that enables organizations to build custom analytical
> applications which
> >     >>>> require a new UI/UX
> >     >>>>
> >     >>>> == Current Status ==
> >     >>>> By many standards, Superset is already a successful open source
> project.
> >     >>>> As
> >     >>>> of March 2017, Superset is officially used in production at
> about a
> >     >> dozen
> >     >>>> companies, has received contributions from over one hundred
> contributors
> >     >>>> on
> >     >>>> Github, 1500+ forks, and 12k+ stars.
> >     >>>>
> >     >>>> Sizeable companies like Airbnb, Yahoo! and Hortonworks have
made
> >     >>>> significant contributions, and expressed their commitment to
the
> >     >> project.
> >     >>>> The product is feature complete and has been viable for months.
> It
> >     >> already
> >     >>>> serves as the main interface for consuming data at many
> companies of
> >     >>>> different sizes.
> >     >>>>
> >     >>>> While the product is usable, there’s room for improvement
> across the
> >     >>>> board,
> >     >>>> starting with providing a smoother user experience around
> content
> >     >>>> creation,
> >     >>>> making sure all features work out-of-the-box on more platforms
> and
> >     >>>> databases, providing better user training guides and videos,
> having a
> >     >>>> predictable release process, and increasing the overall quality
> of the
> >     >>>> Superset releases.
> >     >>>>
> >     >>>> === Meritocracy ===
> >     >>>> We plan to invest in supporting a meritocracy. We will discuss
> the
> >     >>>> requirements in an open forum. Several companies have expressed
> interest
> >     >>>> in
> >     >>>> this project, and we intend to invite additional developers
to
> >     >>>> participate.
> >     >>>> We will encourage and monitor community participation so that
> privileges
> >     >>>> can be extended to those that contribute.
> >     >>>>
> >     >>>> === Community ===
> >     >>>> The need for an enterprise-ready data visualization and
> exploration
> >     >>>> platform in the open source community is tremendous.  While
> Superset is
> >     >>>> fairly well known, recognized and used within the Druid.io
> community,
> >     >>>> adoption is currently limited outside of that niche. There
is a
> huge
> >     >>>> opportunity to grow the community to hundreds if not thousands
> of
> >     >>>> organizations, and we are hoping that embracing “the Apache
> way” will
> >     >>>> accelerate the growth of our community.
> >     >>>>
> >     >>>> We have already been active at seeking and inviting
> contributions, and
> >     >> are
> >     >>>> planning to scale the project by investing time and growing
the
> support
> >     >>>> structure to grow the community.
> >     >>>>
> >     >>>> === Core Developers ===
> >     >>>> The initial committers for Superset include experienced full
> stack,
> >     >>>> front-end and data engineers:
> >     >>>> * Maxime Beauchemin (Airbnb)
> >     >>>> * Alanna Scott (Airbnb)
> >     >>>> * Bogdan Kyryliuk (Airbnb)
> >     >>>> * Vera Liu  (Airbnb)
> >     >>>> * Jeff Feng (Airbnb)
> >     >>>> * Ashutosh Chauhan (Hortonworks)
> >     >>>> * Nishant Bangarwa (Hortonworks)
> >     >>>> * Slim Bouguerra (Hortonworks)
> >     >>>> * Priyank Shah (Hortonworks)
> >     >>>> * Sriharsha Chintalapani (Hortonworks)
> >     >>>> * Daniel Dai (Hortonworks)
> >     >>>>
> >     >>>> We realize that additional employer diversity is needed, and
we
> will
> >     >> work
> >     >>>> aggressively to recruit developers from additional companies.
> >     >>>>
> >     >>>> === Alignment ===
> >     >>>> The initial committers strongly believe that a system for
> interactive
> >     >>>> visualization of data will gain broader adoption as an open
> source,
> >     >>>> community driven project, where the community can contribute
> not only to
> >     >>>> the core components, but also to a growing collection of
> connectors,
> >     >>>> visualizations and improving integration a all potential data
> sources.
> >     >>>> Superset already integrates closely with Apache Hive, the Hive
> >     >> metastore,
> >     >>>> as well as most SQL-speaking databases found in modern data
> ecosystems.
> >     >>>>
> >     >>>> == Known Risks ==
> >     >>>>
> >     >>>> === Orphaned Products ===
> >     >>>> Superset is a vital component for both visualizing, accessing
> and
> >     >>>> democratizing data at Airbnb.  Also at Hortonworks, Superset
is
> a core
> >     >>>> component of the DataFlow product offering.  Thus, the risk
of
> the
> >     >> project
> >     >>>> being orphaned is relatively low.  The project could be at
risk
> if
> >     >> Airbnb
> >     >>>> changes their approach for democratizing data or if Hortonworks
> changes
> >     >>>> their strategy in the market.  In such an event, the committers
> plan to
> >     >>>> continue working on the project on their own time, thought
the
> progress
> >     >>>> will likely be slower.  We plan to mitigate this risk by
> recruiting
> >     >>>> additional committers.
> >     >>>>
> >     >>>> === Inexperience with Open Source ===
> >     >>>> The initial committers include veteran Apache members
> (committers and
> >     >> PPMC
> >     >>>> members) and other developers who have varying degrees of
> experience
> >     >> with
> >     >>>> open source projects. All have been involved with source code
> that has
> >     >>>> been
> >     >>>> released under an open source license, and several also have
> experience
> >     >>>> developing code with an open source development process.
> >     >>>>
> >     >>>> === Homogenous Developers ===
> >     >>>> The initial committers are employed by Airbnb Inc. and
> Hortonworks. We
> >     >> are
> >     >>>> committed to recruiting additional committers from other
> companies.
> >     >>>>
> >     >>>> === Reliance on Salaried Developers ===
> >     >>>> It is expected that Superset development will occur on both
> salaried
> >     >> time
> >     >>>> and on volunteer time, after hours. The majority of initial
> committers
> >     >> are
> >     >>>> paid by their employer to contribute to this project. However,
> they are
> >     >>>> all
> >     >>>> passionate about the project, and we are confident that the
> project will
> >     >>>> continue even if no salaried developers contribute to the
> project. We
> >     >> are
> >     >>>> committed to recruiting additional committers including
> non-salaried
> >     >>>> developers.
> >     >>>>
> >     >>>> === Relationships with Other Apache Products ===
> >     >>>> To the knowledge of the Initial Committers, there are no direct
> >     >>>> competitors
> >     >>>> to Superset within the Apache Software Foundation.  That said,
> Apache
> >     >>>> Zeppelin is an indirect competitor, but it solves a different
> use case.
> >     >>>>
> >     >>>> Apache Zeppelin is a web-based notebook that enables
> interactive data
> >     >>>> analytics. It enables the creation of beautiful data-driven,
> interactive
> >     >>>> and collaborative documents with SQL, Scala and more.  Although
> a user
> >     >> can
> >     >>>> create data visualizations using this project, it leverages
a
> notebook
> >     >>>> style user interfaces and it is geared towards the Spark
> community where
> >     >>>> Scala and SQL co-exist
> >     >>>>
> >     >>>> We look forward to collaborating with those communities, as
> well as
> >     >> other
> >     >>>> Apache communities.
> >     >>>>
> >     >>>> === An Excessive Fascination with the Apache Brand ===
> >     >>>> Superset is solving two huge challenges:
> >     >>>> The challenge of enabling every knowledge worker to make data
> informed
> >     >>>> decisions, particularly those who are not deeply skilled at
> writing SQL.
> >     >>>> The challenge of visualizing huge amounts of data interactively
> and in
> >     >>>> real-time
> >     >>>>
> >     >>>> Superset was first developed as a data visualization solution
> for
> >     >> Druid.io
> >     >>>> as a way to visualize billions of rows of data.  Since then,
> usage of
> >     >>>> Superset has expanded to address data visualization use cases
> across SQL
> >     >>>> speaking data sources as well.
> >     >>>>
> >     >>>> Our rationale for developing Superset as an Apache project
is
> detailed
> >     >> in
> >     >>>> the Rationale Section.  We believe that the Apache brand and
> community
> >     >>>> process will help us attract more contributors to this project,
> and help
> >     >>>> grow the footprint of the project through usage at other
> organizations
> >     >> and
> >     >>>> within other applications.  Establishing consensus among users
> and
> >     >>>> developers will result in a more valuable tool for everyone.
> >     >>>>
> >     >>>> == Documentation ==
> >     >>>> References to further reading material:
> >     >>>> * [[http://airbnb.io/superset/|Superset Documentation]]
> >     >>>> * [[
> >     >>>> https://medium.com/airbnb-engineering/caravel-airbnb-s-data-
> >     >>>> exploration-platform-15a72aa610e5#.npqmmbu25|Blog
> >     >>>> Post:  Superset: Airbnb’s Data Exploration Platform]]
> >     >>>> * [[
> >     >>>> https://medium.com/airbnb-engineering/superset-scaling-data-
> >     >>>> access-and-visual-insights-at-airbnb-3ce3e9b88a7f#.
> a505zvb1t|Blog
> >     >>>> Post:  Superset: Scaling Data Access & Visual Insights
at
> Airbnb]]
> >     >>>>
> >     >>>> == Initial Source ==
> >     >>>> The origin of the proposed code base can be found at
> >     >>>> https://github.com/airbnb/superset.  The code base is
> primarily in
> >     >>>> Python.
> >     >>>>
> >     >>>> == Source and Intellectual Property Submission Plan ==
> >     >>>> We do not expect any complications for the submission of the
> Superset
> >     >> code
> >     >>>> base.  Our code is already in Github and there is only a single
> code
> >     >> base.
> >     >>>>
> >     >>>> == External Dependencies ==
> >     >>>> List of Python packages, from the Python Package Index (Pypi):
> >     >>>>
> >     >>>> * boto3
> >     >>>> * celery
> >     >>>> * cryptography
> >     >>>> * flask-appbuilder
> >     >>>> * flask-cache
> >     >>>> * flask-migrate
> >     >>>> * flask-script
> >     >>>> * flask-sqlalchemy
> >     >>>> * flask-testing
> >     >>>> * humanize
> >     >>>> * gunicorn
> >     >>>> * markdown
> >     >>>> * pandas
> >     >>>> * parsedatetime
> >     >>>> * pydruid
> >     >>>> * PyHive
> >     >>>> * python-dateutil
> >     >>>> * requests
> >     >>>> * simplejson
> >     >>>> * six
> >     >>>> * sqlalchemy
> >     >>>> * sqlalchemy-utils
> >     >>>> * sqlparse
> >     >>>> * thrift
> >     >>>> * thrift-sasl
> >     >>>> * werkzeug
> >     >>>>
> >     >>>> List of Javascript packages, from NPM:
> >     >>>> * autobind-decorator
> >     >>>> * bootstrap
> >     >>>> * bootstrap-datepicker
> >     >>>> * brace
> >     >>>> * brfs
> >     >>>> * cal-heatmap
> >     >>>> * classnames
> >     >>>> * d3
> >     >>>> * d3-cloud
> >     >>>> * d3-sankey
> >     >>>> * d3-scale
> >     >>>> * d3-tip
> >     >>>> * datamaps
> >     >>>> * datatables-bootstrap3-plugin
> >     >>>> * datatables.net-bs
> >     >>>> * font-awesome
> >     >>>> * gridster
> >     >>>> * immutability-helper
> >     >>>> * immutable
> >     >>>> * jquery
> >     >>>> * lodash.throttle
> >     >>>> * mapbox-gl
> >     >>>> * moment
> >     >>>> * moments
> >     >>>> * mustache
> >     >>>> * nvd3
> >     >>>> * react
> >     >>>> * react-ace
> >     >>>> * react-bootstrap
> >     >>>> * react-bootstrap-table
> >     >>>> * react-dom
> >     >>>> * react-draggable
> >     >>>> * react-gravatar
> >     >>>> * react-grid-layout
> >     >>>> * react-map-gl
> >     >>>> * react-redux
> >     >>>> * react-resizable
> >     >>>> * react-select
> >     >>>> * react-syntax-highlighter
> >     >>>> * reactable
> >     >>>> * redux
> >     >>>> * redux-localstorage
> >     >>>> * redux-thunk
> >     >>>> * shortid
> >     >>>> * style-loader
> >     >>>> * supercluster
> >     >>>> * topojson
> >     >>>> * victory
> >     >>>> * viewport-mercator-project
> >     >>>>
> >     >>>> == Cryptography ==
> >     >>>> The proposal does not include cryptographic code.
> >     >>>>
> >     >>>> == Required Resources ==
> >     >>>>
> >     >>>> === Mailing List ===
> >     >>>> There is a current mailing list as a Google Group
> “airbnb_superset” that
> >     >>>> we
> >     >>>> are planning on deprecating as the Apache.org become ready
to
> serve our
> >     >>>> community.
> >     >>>>
> >     >>>> * superset-private
> >     >>>> * superset-dev
> >     >>>> * superset-user
> >     >>>>
> >     >>>> === Subversion Directory ===
> >     >>>> Git is the preferred source control system.
> >     >>>> http://svn.apache.org/repos/asf/incubator/superset
> >     >>>>
> >     >>>> == Git Repository ==
> >     >>>> Git is the preferred source control system, we’re assuming
> >     >>>> https://github.com/apache/incubator-superset based on the
> naming scheme
> >     >>>>
> >     >>>> == Issue Tracking ==
> >     >>>> JIRA Superset (SUPERSET). If possible, we’d like to use Github
> issues &
> >     >>>> PRs
> >     >>>> to manage our project as much as possible. It’s been said
that
> there are
> >     >>>> ways to keep Github’s issues in sync with Jira, allowing
us to
> get best
> >     >> of
> >     >>>> both worlds. If that is not possible, we will comply to using
> Jira.
> >     >>>>
> >     >>>> == Other Resources ==
> >     >>>> We currently use a set of Github integrated services that are
> free to
> >     >> the
> >     >>>> open source community, like Travis-ci, Code Climate, Coveralls,
> >     >>>> Landscape.io, Requires.io, david-dm and Gitter. We would like
> to keep
> >     >>>> using
> >     >>>> these services as they allow us to scale contributions and
> optimize our
> >     >>>> development flows. These services require some elevated rights
> on the
> >     >>>> Github repository in order to set up or tune and we would like
> for the
> >     >>>> committers to have the required rights.
> >     >>>>
> >     >>>>
> >     >>>> == Initial Committers ==
> >     >>>>
> >     >>>> * Maxime Beauchemin <maxime.beauchemin@airbnb.com> -
PPMC &
> Committer
> >     >>>> * Alanna Scott <alanna.scott@airbnb.com> - PPMC &
Committer
> >     >>>> * Bogdan Kyryliuk <b.kyryliuk@gmail.com> - PPMC &
Committer
> >     >>>> * Vera Liu <vera.liu@airbnb.com> - Committer
> >     >>>> * Jeff Feng <jeff.feng@airbnb.com> - PPMC & Committer
> >     >>>> * Ashutosh Chauhan <hashutosh@apache.org> - Mentor &
Committer
> >     >>>> * Nishant Bangarwa <nbangarwa@hortonworks.com> - PPMC
&
> Committer
> >     >>>> * Slim Bouguerra <sbouguerra@hortonworks.com> - Committer
> >     >>>> * Priyank Shah <pshah@hortonworks.com> - Committer
> >     >>>> * Harsha Chintalapani <schintalapani@hortonworks.com>
-
> Committer
> >     >>>> * Daniel Dai <daijy@apache.org> - Champion & Committer
> >     >>>> * Luke Han <luke.han@apache.org> - Mentor
> >     >>>>
> >     >>>> == Affiliations ==
> >     >>>> The initial committers are employees of Airbnb Inc. and
> Hortonworks.
> >     >>>>
> >     >>>> == Sponsors ==
> >     >>>>
> >     >>>> === Champion ===
> >     >>>> Daniel Dai <daijy@apache.org>
> >     >>>>
> >     >>>> === Nominated Mentors ===
> >     >>>> * Ashutosh Chauhan <hashutosh@apache.org>
> >     >>>> * Luke Han <luke.han@apache.org>
> >     >>>>
> >     >>>> === Sponsoring Entity ===
> >     >>>> Incubator PMC
> >     >>>>
> >     >>>
> >     >>>
> >     >>
> >
> >
> >     ------------------------------------------------------------
> ---------
> >     To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >     For additional commands, e-mail: general-help@incubator.apache.org
> >
> >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message