incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: [VOTE] Accept Kylin into the Apache Incubator
Date Fri, 21 Nov 2014 06:45:51 GMT
+1 (binding)



On Fri, Nov 21, 2014 at 3:37 AM, Andrew Purtell <apurtell@apache.org> wrote:

> +1 (binding)
>
> On Thu, Nov 20, 2014 at 2:31 PM, Luke Han <luke.hq@gmail.com> wrote:
>
> > Following the discussion earlier in the thread:
> >
> >
> >
> http://mail-archives.apache.org/mod_mbox/incubator-general/201411.mbox/%3cCAKmQrOb22+n+R++dAte33f3pCpyUjhFoEaqRMS3T-UdJwK66GQ@mail.gmail.com%3e
> >
> > I would like to call a VOTE for accepting Kylin as a new incubator
> project.
> >
> > The proposal is available at:
> > https://wiki.apache.org/incubator/KylinProposal
> >
> > and posted the text of the proposal below also.
> >
> > Vote is open until 24th November 2014, 23:59:00 UTC
> >
> > [ ] +1 accept Kylin in the Incubator
> > [ ] ±0
> > [ ] -1 because...
> >
> >
> > Thanks
> > Luke
> >
> >
> > Kylin Proposal
> > ==============
> >
> > # Abstract
> >
> > Kylin is a distributed and scalable OLAP engine built on Hadoop to
> > support extremely large datasets.
> >
> > # Proposal
> >
> > Kylin is an open source Distributed Analytics Engine that provides
> > multi-dimensional analysis (MOLAP) on Hadoop. Kylin is designed to
> > accelerate analytics on Hadoop by allowing the use of SQL-compatible
> > tools. Kylin provides a SQL interface and multi-dimensional analysis
> > (MOLAP) on Hadoop to support extremely large datasets and tightly
> > integrate with Hadoop ecosystem.
> >
> > ## Overview of Kylin
> >
> > Kylin platform has two parts of data processing and interactive:
> > First, Kylin will read data from source, Hive, and run a set of tasks
> > including Map Reduce job, shell script to pre-calcuate results for a
> > specified data model, then save the resulting OLAP cube into storage
> > such as HBase. Once these OLAP cubes are ready, a user can submit a
> > request from any SQL-based tool or third party applications to Kylin’s
> > REST server. The Server calls the Query Engine to determine if the
> > target dataset already exists. If so, the engine directly accesses the
> > target data in the form of a predefined cube, and returns the result
> > with sub-second latency. Otherwise, the engine is designed to route
> > non-matching queries to whichever SQL on Hadoop tool is already
> > available on a Hadoop cluster, such as Hive.
> >
> > Kylin platform includes:
> >
> > - Metadata Manager: Kylin is a metadata-driven application. The Kylin
> > Metadata Manager is the key component that manages all metadata stored
> > in Kylin including all cube metadata. All other components rely on the
> > Metadata Manager.
> >
> > - Job Engine: This engine is designed to handle all of the offline
> > jobs including shell script, Java API, and Map Reduce jobs. The Job
> > Engine manages and coordinates all of the jobs in Kylin to make sure
> > each job executes and handles failures.
> >
> > - Storage Engine: This engine manages the underlying storage –
> > specifically, the cuboids, which are stored as key-value pairs. The
> > Storage Engine uses HBase – the best solution from the Hadoop
> > ecosystem for leveraging an existing K-V system. Kylin can also be
> > extended to support other K-V systems, such as Redis.
> >
> > - Query Engine: Once the cube is ready, the Query Engine can receive
> > and parse user queries. It then interacts with other components to
> > return the results to the user.
> >
> > - REST Server: The REST Server is an entry point for applications to
> > develop against Kylin. Applications can submit queries, get results,
> > trigger cube build jobs, get metadata, get user privileges, and so on.
> >
> > - ODBC Driver: To support third-party tools and applications – such as
> > Tableau – we have built and open-sourced an ODBC Driver. The goal is
> > to make it easy for users to onboard.
> >
> > # Background
> >
> > The challenge we face at eBay is that our data volume is becoming
> > bigger and bigger while our user base is becoming more diverse. For
> > e.g. our business users and analysts consistently ask for minimal
> > latency when visualizing data on Tableau and Excel. So, we worked
> > closely with our internal analyst community and outlined the product
> > requirements for Kylin:
> >
> > - Sub-second query latency on billions of rows
> > - ANSI SQL availability for those using SQL-compatible tools
> > - Full OLAP capability to offer advanced functionality
> > - Support for high cardinality and very large dimensions
> > - High concurrency for thousands of users
> > - Distributed and scale-out architecture for analysis in the TB to PB
> size
> > range
> >
> > Existing SQL-on-Hadoop solutions commonly need to perform partial or
> > full table or file scans to compute the results of queries. The cost
> > of these large data scans can make many queries very slow (more than a
> > minute). The core idea of MOLAP (multi-dimensional OLAP) is to
> > pre-compute data along dimensions of interest and store resulting
> > aggregates as a "cube". MOLAP is much faster but is inflexible. We
> > realized that no existing product met our exact requirements
> > externally – especially in the open source Hadoop community. To meet
> > our emerging business needs, we built a platform from scratch to
> > support MOLAP for these business requirements and then to support more
> > others include ROLAP. With an excellent development team and several
> > pilot customers, we have been able to bring the Kylin platform into
> > production as well as open source it.
> >
> > # Rationale
> >
> > When data grows to petabyte scale, the process of pre-calculation of a
> > query takes a long time and costly and powerful hardware. However,
> > with the benefit of Hadoop’s distributed computing architecture, jobs
> > can leverage hundreds or thousands of Hadoop data nodes. There still
> > exists a big gap between the growing volume of data and interactive
> > analytics:
> >
> > - Existing Business Intelligence (OLAP) platforms cannot scale out to
> > support fast growing data.
> > - Existing SQL on Hadoop projects are not designed for OLAP use cases,
> > huge tables joins will always take long time to scan and calculate.
> > - No mature OLAP solution exists on Hadoop
> >
> > As mentioned in the background, the business requirements triggered by
> > increase in data volume drove eBay to invest in building a solution
> > from scratch to offer Analytics capability on Hadoop cluster. With
> > Hadoop’s power of distributed computing Kylin can perform
> > pre-calculations in parallel and merge the final results, thereby
> > significantly reducing the processing time.
> >
> > To serve queries by the analyst community, Kylin generates cuboids
> > with all possible combinations of dimensions, and calculate all
> > metrics at different levels. The cuboids are then integrated to form a
> > pre-calculated OLAP cube. All cuboids are key-value structured: keys
> > are composites formed from combinations of multiple dimensions and
> > values are aggregations results for that particular combination of
> > dimensions. Kylin uses HBase to store cubes. HBase is useful because
> > it supports efficient searches across ranges of data.
> >
> > # Current Status
> >
> > ## Meritocracy
> >
> > Kylin has been deployed in production at eBay and is processing
> > extremely large datasets. The platform has demonstrated great
> > performance benefits and has proved to be a better way for analysts to
> > leverage data on Hadoop with a more convenient approach using their
> > favorite tool.
> >
> > ## Community
> >
> > Kylin seeks to develop developer and user communities during incubation.
> >
> > ## Core Developers
> >
> > Kylin is currently being designed and developed by six engineers from
> > eBay Inc. – Jiang Xu, Luke Han, Yang Li, George Song, Hongbin Ma and
> > Xiaodong Duo. In addition, some outside contributors are actively
> > contributing in design and development. Among them, Julian Hyde from
> > Hortonworks is a very important contributor. All of these core
> > developers have deep expertise in Hadoop and the Hadoop Ecosystem in
> > general.
> >
> > ## Alignment
> >
> > The ASF is a natural host for Kylin given that it is already the home
> > of Hadoop, Pig, Hive, and other emerging cloud software projects.
> > Kylin was designed to offer OLAP capability on Hadoop from the
> > beginning in order to solve data access and analysis challenges in
> > Hadoop clusters. Kylin complements the existing Hadoop analytics area
> > by providing a comprehensive solution based on pre-computed views.
> >
> > In Kylin, we are leveraging an open-source dynamic data management
> > framework called Apache Calcite to parse SQL and plug in our code.
> > Apache Calcite was previously called Optiq, was originally authored by
> > Julian Hyde and is now an Apache Incubator project.
> >
> > # Known Risks
> >
> > ## Orphaned Products
> >
> > The core developers of Kylin team plan to work full time on this
> > project. There is very little risk of Kylin getting orphaned since at
> > least one large company (eBay) is extensively using it in their
> > production Hadoop clusters. For example, currently there are 3 use
> > cases with more that 12+Billion rows and 1000 activity requests per
> > day using Kylin in production. Furthermore, since Kylin was open
> > sourced at the beginning of October 2014, it has received more than
> > 280 stars and been forked nearly 100 times. Kylin has one major
> > release so far and and received 5 pull requests from contributors in
> > the first month pull requests from external sources in the last month,
> > which further demonstrates Kylin as a very active project. We plan to
> > extend and diversify this community further through Apache.
> >
> > ## Inexperience with Open Source
> >
> > The core developers are all active users and followers of open source.
> > They are already committers and contributors to the Kylin Github
> > project. All have been involved with the source code that has been
> > released under an open source license, and several of them also have
> > experience developing code in an open source environment. Though the
> > core set of Developers do not have Apache Open Source experience,
> > there are plans to onboard individuals with Apache open source
> > experience on to the project.
> >
> > ## Homogenous Developers
> >
> > The core developers include developers from eBay, Ctrip and
> > Hortonworks. Apache Incubation process encourages an open and diverse
> > meritocratic community. Apache Kylin has the required amount of
> > diversity with committers from three different organizations, but is
> > also aware that bulk of the commits come from a single entity. Kylin
> > intends to make every possible effort to build a diverse, vibrant and
> > involved community and has already received substantial interest from
> > various organizations
> >
> > ## Reliance on Salaried Developers
> >
> > eBay invested in Kylin as the OLAP solution on top of Hadoop clusters
> > and some of its key engineers are working full time on the project. In
> > addition, since there is a growing Big Data need for scalable OLAP
> > solutions on Hadoop, we look forward to other Apache developers and
> > researchers to contribute to the project. Additional contributors,
> > including Apache committers have plans to join this effort shortly.
> > Also key to addressing the risk associated with relying on Salaried
> > developers from a single entity is to increase the diversity of the
> > contributors and actively lobby for Domain experts in the BI space to
> > contribute. Apache Kylin intends to do this. One approach already
> > taken is to approach the Apache Drill project to explore possible
> > cooperation.
> >
> > ## Relationships with Other Apache Products
> >
> > Kylin has a strong relationship and dependency with Apache Hadoop
> > HBase, Hive and Calcite. Being part of Apache’s Incubation community,
> > could help with a closer collaboration among these four projects and
> > as well as others.
> >
> > Kylin is likely to have substantial value to Apache Drill due to the
> > common use of Calcite as a query optimization engine and similar
> > approaches between Kylin's approach to cubing and Drill's approach to
> > input sources.
> >
> > ## An Excessive Fascination with the Apache Brand
> >
> > Kylin is proposing to enter incubation at Apache in order to help
> > efforts to diversify the committer-base, not so much to capitalize on
> > the Apache brand. The Kylin project is in production use already
> > inside EBay, but is not expected to be an EBay product for external
> > customers. As such, the Kylin project is not seeking to use the Apache
> > brand as a marketing tool.
> >
> > # Documentation
> >
> > Information about Kylin can be found at
> > https://github.com/KylinOLAP/Kylin. The following links provide more
> > information about Kylin in open source:
> >
> > - Kylin web site: http://kylin.io
> > - Codebase at Github: https://github.com/KylinOLAP/Kylin
> > - Issue Tracking: https://github.com/KylinOLAP/Kylin/issues
> > - User community: https://groups.google.com/forum/#!forum/kylin-olap
> >
> > ## Initial Source
> >
> > Kylin has been under development since 2013 by a team of engineers at
> > eBay Inc. It is currently hosted on Github.com under an Apache license
> > at https://github.com/KylinOLAP/Kylin
> >
> > ## External Dependencies
> >
> > Kylin has the following external dependencies.
> >
> > * Basic
> >
> > - JDK 1.6+
> > - Apache Maven
> > - JUnit
> > - DBUnit
> > - Log4j
> > - Slf4j
> > - Apache Commons
> > - Google Guava
> > - Jackson
> >
> > * Hadoop
> >
> > - Apache Hadoop
> > - Apache HBase
> > - Apache Hive
> > - Apache Zookeeper
> > - Apache Curator
> >
> > * Utility
> >
> > - H2
> > - JSCH
> >
> > * REST Service
> >
> > - Spring
> >
> > * Query
> >
> > - Antlr
> > - Apache Calcite (formerly Optiq)
> > - Linq4j
> >
> > * Job
> >
> > - Quartz
> >
> > * Web build tool
> >
> > - NPM
> > - Grunt
> > - bower
> >
> > * Web
> >
> > - Angular JS
> > - jQuery
> > - Bootstrap
> > - D3 JS
> > - ACE
> >
> > ##Cryptography
> >
> > Kylin will eventually support encryption on the wire. This is not one
> > of the initial goals, and we do not expect Kylin to be a controlled
> > export item due to the use of encryption. Kylin supports but does not
> > require the Kerberos authentication mechanism to access secured Hadoop
> > services.
> >
> > # Required Resources
> >
> > ## Mailing List
> >
> > - kylin-private for private PMC discussions (with moderated
> subscriptions)
> > - kylin-dev
> > - kylin-commits
> >
> > ##Subversion Directory
> >
> > Git is the preferred source control system: git://git.apache.org/Kylin
> >
> > ## Issue Tracking
> >
> > JIRA Kylin (KYLIN)
> >
> > ## Other Resources
> >
> > The existing code already has unit tests so we will make use of
> > existing Apache continuous testing infrastructure. The resulting load
> > should not be very large.
> >
> > # Initial Committers
> >
> > - Jiang Xu < jiangxu.china at gmail dot com>
> > - Luke Han <lukhan at ebay dot com>
> > - Yang Li <yangli9 at ebay dot com>
> > - George Song <ysong1 at ebay dot com>
> > - Hongbin Ma <honma at ebay dot com>
> > - Xiaodong Duo < oranjedog at gmail dot com>
> > - Julian Hyde < jhyde at apache dot org >
> > - Ankur Bansal < abansal at ebay dot com>
> >
> > ## Affiliations
> >
> > The initial committers are employees of eBay Inc., Ctrip and
> > Hortonworks. The nominated mentors are employees of Hortonworks, MapR
> > Technologies and Pivotal.
> >
> > # Sponsors
> >
> > ## Champion
> >
> > - Owen O’Malley < omalley at apache dot org >
> > - Ted Dunning <tdunning at apache dot org>
> >
> > ## Nominated Mentors
> >
> > - Owen O’Malley < omalley at apache dot org > - Apache IPMC member,
> > Co-founder and Senior Architect, Hortonworks
> > - Ted Dunning < tdunning at apache dot org> - Apache IPMC member,
> > Chief Architect, MapR Technologies
> > - Henry Saputra <hsaputra at apache dot org> - Apache IPMC member,
> Pivotal
> > - Jacques Nadeau <jacques at apache dot org> (pending admission to
> > IPMC) - Apache Drill PMC Chair, MapR Technologies
> >
> > #Sponsoring Entity
> >
> > We are requesting the Incubator to sponsor this project.
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message