Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8EAE4C4F2 for ; Thu, 20 Nov 2014 08:30:27 +0000 (UTC) Received: (qmail 1709 invoked by uid 500); 20 Nov 2014 08:30:26 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 1500 invoked by uid 500); 20 Nov 2014 08:30:26 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 1487 invoked by uid 99); 20 Nov 2014 08:30:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Nov 2014 08:30:26 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of luke.hq@gmail.com designates 209.85.215.52 as permitted sender) Received: from [209.85.215.52] (HELO mail-la0-f52.google.com) (209.85.215.52) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Nov 2014 08:29:59 +0000 Received: by mail-la0-f52.google.com with SMTP id q1so1964395lam.25 for ; Thu, 20 Nov 2014 00:27:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=sp3RW6GDmUrs7HkrygvcaNmx9yCFlz7BE5P5bgnnL68=; b=EWLGlsNc9/qoCV1u+uWTUpLlQE7DgJK3iHIsAmhbLjmrSM2pLZPfpf4h7pjpQC+le0 qZEq3UgWmZLvAQoOthyU08nDZBZb8qLk2tfHEgcJLxPiKkq4LgxQC3OeFFztNRzPdDC7 uo/gIU/okE93n29aqw0mwyUVRCBp8wE1CSf1qd6CV5eputBfCgvMIrUiaETa84v/+TM9 rYZeYhPEItfqvlUtZJ9cHf6x19fLNXQmkGZUdC498hjSSZG/0u1E5aLbg3XZWz157PA+ qKt0t6UXxxJQpIO7CriJ6G/SsYsiRGmIf6J5jICOoA30ZVS5jH2W2pO1HPbzDAiktPLl hHbw== MIME-Version: 1.0 X-Received: by 10.112.157.194 with SMTP id wo2mr9904312lbb.55.1416472063140; Thu, 20 Nov 2014 00:27:43 -0800 (PST) Received: by 10.112.17.168 with HTTP; Thu, 20 Nov 2014 00:27:43 -0800 (PST) In-Reply-To: <82B97CAF-2931-40CE-AAA9-71C2F7952501@gmail.com> References: <82B97CAF-2931-40CE-AAA9-71C2F7952501@gmail.com> Date: Thu, 20 Nov 2014 16:27:43 +0800 Message-ID: Subject: Re: [PROPOSAL] Kylin for Incubation From: Luke Han To: general@incubator.apache.org Content-Type: multipart/alternative; boundary=001a11c33f5e28686d0508461a8a X-Virus-Checked: Checked by ClamAV on apache.org --001a11c33f5e28686d0508461a8a Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi all, Thank you for reviewing the proposal, with the discussion winding down we would like to send VOTE email next. Thanks Luke 2014-11-15 11:40 GMT+08:00 Ted Dunning : > > Also, a Chinese localized operating system is pretty clearly different > from an olap engine. > > For comparison see the recent non-issue regarding Amazon aurora versus > apache aurora. > > Sent from my iPhone > > > On Nov 14, 2014, at 9:55, Henry Saputra wrote= : > > > > Thanks for the reminder Ross. > > Hopefully we could go in the similar route as Apache Spark, Apache > > Storm, and Apache MetaModel where the trademark should be used as > > 'Apache Kylin'. > > > > > > - Henry > > > > On Fri, Nov 14, 2014 at 7:47 AM, Ross Gardler (MS OPEN TECH) > > wrote: > >> Potential trademark clash: http://www.ubuntu.com/desktop/ubuntu-kylin > >> > >> Sent from my Windows Phone > >> ________________________________ > >> From: Luke Han > >> Sent: =E2=80=8E11/=E2=80=8E14/=E2=80=8E2014 7:38 AM > >> To: general@incubator.apache.org > >> Subject: [PROPOSAL] Kylin for Incubation > >> > >> Hi all, > >> We would like to propose Kylin as an Apache Incubator project. The > >> complete proposal can be found: > >> https://wiki.apache.org/incubator/KylinProposal and posted the text of > >> the proposal below. > >> > >> Thanks. > >> Luke > >> > >> > >> Kylin Proposal > >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >> > >> # Abstract > >> > >> Kylin is a distributed and scalable OLAP engine built on Hadoop to > >> support extremely large datasets. > >> > >> # Proposal > >> > >> Kylin is an open source Distributed Analytics Engine that provides > >> multi-dimensional analysis (MOLAP) on Hadoop. Kylin is designed to > >> accelerate analytics on Hadoop by allowing the use of SQL-compatible > >> tools. Kylin provides a SQL interface and multi-dimensional analysis > >> (MOLAP) on Hadoop to support extremely large datasets and tightly > >> integrate with Hadoop ecosystem. > >> > >> ## Overview of Kylin > >> > >> Kylin platform has two parts of data processing and interactive: > >> First, Kylin will read data from source, Hive, and run a set of tasks > >> including Map Reduce job, shell script to pre-calcuate results for a > >> specified data model, then save the resulting OLAP cube into storage > >> such as HBase. Once these OLAP cubes are ready, a user can submit a > >> request from any SQL-based tool or third party applications to Kylin= =E2=80=99s > >> REST server. The Server calls the Query Engine to determine if the > >> target dataset already exists. If so, the engine directly accesses the > >> target data in the form of a predefined cube, and returns the result > >> with sub-second latency. Otherwise, the engine is designed to route > >> non-matching queries to whichever SQL on Hadoop tool is already > >> available on a Hadoop cluster, such as Hive. > >> > >> Kylin platform includes: > >> > >> - Metadata Manager: Kylin is a metadata-driven application. The Kylin > >> Metadata Manager is the key component that manages all metadata stored > >> in Kylin including all cube metadata. All other components rely on the > >> Metadata Manager. > >> > >> - Job Engine: This engine is designed to handle all of the offline > >> jobs including shell script, Java API, and Map Reduce jobs. The Job > >> Engine manages and coordinates all of the jobs in Kylin to make sure > >> each job executes and handles failures. > >> > >> - Storage Engine: This engine manages the underlying storage =E2=80=93 > >> specifically, the cuboids, which are stored as key-value pairs. The > >> Storage Engine uses HBase =E2=80=93 the best solution from the Hadoop > >> ecosystem for leveraging an existing K-V system. Kylin can also be > >> extended to support other K-V systems, such as Redis. > >> > >> - Query Engine: Once the cube is ready, the Query Engine can receive > >> and parse user queries. It then interacts with other components to > >> return the results to the user. > >> > >> - REST Server: The REST Server is an entry point for applications to > >> develop against Kylin. Applications can submit queries, get results, > >> trigger cube build jobs, get metadata, get user privileges, and so on. > >> > >> - ODBC Driver: To support third-party tools and applications =E2=80=93= such as > >> Tableau =E2=80=93 we have built and open-sourced an ODBC Driver. The g= oal is > >> to make it easy for users to onboard. > >> > >> # Background > >> > >> The challenge we face at eBay is that our data volume is becoming > >> bigger and bigger while our user base is becoming more diverse. For > >> e.g. our business users and analysts consistently ask for minimal > >> latency when visualizing data on Tableau and Excel. So, we worked > >> closely with our internal analyst community and outlined the product > >> requirements for Kylin: > >> > >> - Sub-second query latency on billions of rows > >> - ANSI SQL availability for those using SQL-compatible tools > >> - Full OLAP capability to offer advanced functionality > >> - Support for high cardinality and very large dimensions > >> - High concurrency for thousands of users > >> - Distributed and scale-out architecture for analysis in the TB to PB > size range > >> > >> Existing SQL-on-Hadoop solutions commonly need to perform partial or > >> full table or file scans to compute the results of queries. The cost > >> of these large data scans can make many queries very slow (more than a > >> minute). The core idea of MOLAP (multi-dimensional OLAP) is to > >> pre-compute data along dimensions of interest and store resulting > >> aggregates as a "cube". MOLAP is much faster but is inflexible. We > >> realized that no existing product met our exact requirements > >> externally =E2=80=93 especially in the open source Hadoop community. T= o meet > >> our emerging business needs, we built a platform from scratch to > >> support MOLAP for these business requirements and then to support more > >> others include ROLAP. With an excellent development team and several > >> pilot customers, we have been able to bring the Kylin platform into > >> production as well as open source it. > >> > >> # Rationale > >> > >> When data grows to petabyte scale, the process of pre-calculation of a > >> query takes a long time and costly and powerful hardware. However, > >> with the benefit of Hadoop=E2=80=99s distributed computing architectur= e, jobs > >> can leverage hundreds or thousands of Hadoop data nodes. There still > >> exists a big gap between the growing volume of data and interactive > >> analytics: > >> > >> - Existing Business Intelligence (OLAP) platforms cannot scale out to > >> support fast growing data. > >> - Existing SQL on Hadoop projects are not designed for OLAP use cases, > >> huge tables joins will always take long time to scan and calculate. > >> - No mature OLAP solution exists on Hadoop > >> > >> As mentioned in the background, the business requirements triggered by > >> increase in data volume drove eBay to invest in building a solution > >> from scratch to offer Analytics capability on Hadoop cluster. With > >> Hadoop=E2=80=99s power of distributed computing Kylin can perform > >> pre-calculations in parallel and merge the final results, thereby > >> significantly reducing the processing time. > >> > >> To serve queries by the analyst community, Kylin generates cuboids > >> with all possible combinations of dimensions, and calculate all > >> metrics at different levels. The cuboids are then integrated to form a > >> pre-calculated OLAP cube. All cuboids are key-value structured: keys > >> are composites formed from combinations of multiple dimensions and > >> values are aggregations results for that particular combination of > >> dimensions. Kylin uses HBase to store cubes. HBase is useful because > >> it supports efficient searches across ranges of data. > >> > >> # Current Status > >> > >> ## Meritocracy > >> > >> Kylin has been deployed in production at eBay and is processing > >> extremely large datasets. The platform has demonstrated great > >> performance benefits and has proved to be a better way for analysts to > >> leverage data on Hadoop with a more convenient approach using their > >> favorite tool. > >> > >> ## Community > >> > >> Kylin seeks to develop developer and user communities during incubatio= n. > >> > >> ## Core Developers > >> > >> Kylin is currently being designed and developed by six engineers from > >> eBay Inc. =E2=80=93 Jiang Xu, Luke Han, Yang Li, George Song, Hongbin = Ma and > >> Xiaodong Duo. In addition, some outside contributors are actively > >> contributing in design and development. Among them, Julian Hyde from > >> Hortonworks is a very important contributor. All of these core > >> developers have deep expertise in Hadoop and the Hadoop Ecosystem in > >> general. > >> > >> ## Alignment > >> > >> The ASF is a natural host for Kylin given that it is already the home > >> of Hadoop, Pig, Hive, and other emerging cloud software projects. > >> Kylin was designed to offer OLAP capability on Hadoop from the > >> beginning in order to solve data access and analysis challenges in > >> Hadoop clusters. Kylin complements the existing Hadoop analytics area > >> by providing a comprehensive solution based on pre-computed views. > >> > >> In Kylin, we are leveraging an open-source dynamic data management > >> framework called Apache Calcite to parse SQL and plug in our code. > >> Apache Calcite was previously called Optiq, was originally authored by > >> Julian Hyde and is now an Apache Incubator project. > >> > >> # Known Risks > >> > >> ## Orphaned Products > >> > >> The core developers of Kylin team plan to work full time on this > >> project. There is very little risk of Kylin getting orphaned since at > >> least one large company (eBay) is extensively using it in their > >> production Hadoop clusters. For example, currently there are 3 use > >> cases with more that 12+Billion rows and 1000 activity requests per > >> day using Kylin in production. Furthermore, since Kylin was open > >> sourced at the beginning of October 2014, it has received more than > >> 280 stars and been forked nearly 100 times. Kylin has one major > >> release so far and and received 5 pull requests from contributors in > >> the first month pull requests from external sources in the last month, > >> which further demonstrates Kylin as a very active project. We plan to > >> extend and diversify this community further through Apache. > >> > >> ## Inexperience with Open Source > >> > >> The core developers are all active users and followers of open source. > >> They are already committers and contributors to the Kylin Github > >> project. All have been involved with the source code that has been > >> released under an open source license, and several of them also have > >> experience developing code in an open source environment. Though the > >> core set of Developers do not have Apache Open Source experience, > >> there are plans to onboard individuals with Apache open source > >> experience on to the project. > >> > >> ## Homogenous Developers > >> > >> The core developers include developers from eBay, Ctrip and > >> Hortonworks. Apache Incubation process encourages an open and diverse > >> meritocratic community. Apache Kylin has the required amount of > >> diversity with committers from three different organizations, but is > >> also aware that bulk of the commits come from a single entity. Kylin > >> intends to make every possible effort to build a diverse, vibrant and > >> involved community and has already received substantial interest from > >> various organizations > >> > >> ## Reliance on Salaried Developers > >> > >> eBay invested in Kylin as the OLAP solution on top of Hadoop clusters > >> and some of its key engineers are working full time on the project. In > >> addition, since there is a growing Big Data need for scalable OLAP > >> solutions on Hadoop, we look forward to other Apache developers and > >> researchers to contribute to the project. Additional contributors, > >> including Apache committers have plans to join this effort shortly. > >> Also key to addressing the risk associated with relying on Salaried > >> developers from a single entity is to increase the diversity of the > >> contributors and actively lobby for Domain experts in the BI space to > >> contribute. Apache Kylin intends to do this. One approach already > >> taken is to approach the Apache Drill project to explore possible > >> cooperation. > >> > >> ## Relationships with Other Apache Products > >> > >> Kylin has a strong relationship and dependency with Apache Hadoop > >> HBase, Hive and Calcite. Being part of Apache=E2=80=99s Incubation com= munity, > >> could help with a closer collaboration among these four projects and > >> as well as others. > >> > >> Kylin is likely to have substantial value to Apache Drill due to the > >> common use of Calcite as a query optimization engine and similar > >> approaches between Kylin's approach to cubing and Drill's approach to > >> input sources. > >> > >> ## An Excessive Fascination with the Apache Brand > >> > >> Kylin is proposing to enter incubation at Apache in order to help > >> efforts to diversify the committer-base, not so much to capitalize on > >> the Apache brand. The Kylin project is in production use already > >> inside EBay, but is not expected to be an EBay product for external > >> customers. As such, the Kylin project is not seeking to use the Apache > >> brand as a marketing tool. > >> > >> # Documentation > >> > >> Information about Kylin can be found at > >> https://github.com/KylinOLAP/Kylin. The following links provide more > >> information about Kylin in open source: > >> > >> - Kylin web site: http://kylin.io > >> - Codebase at Github: https://github.com/KylinOLAP/Kylin > >> - Issue Tracking: https://github.com/KylinOLAP/Kylin/issues > >> - User community: https://groups.google.com/forum/#!forum/kylin-olap > >> > >> ## Initial Source > >> > >> Kylin has been under development since 2013 by a team of engineers at > >> eBay Inc. It is currently hosted on Github.com under an Apache license > >> at https://github.com/KylinOLAP/Kylin > >> > >> ## External Dependencies > >> > >> Kylin has the following external dependencies. > >> > >> * Basic > >> > >> - JDK 1.6+ > >> - Apache Maven > >> - JUnit > >> - DBUnit > >> - Log4j > >> - Slf4j > >> - Apache Commons > >> - Google Guava > >> - Jackson > >> > >> * Hadoop > >> > >> - Apache Hadoop > >> - Apache HBase > >> - Apache Hive > >> - Apache Zookeeper > >> - Apache Curator > >> > >> * Utility > >> > >> - H2 > >> - JSCH > >> > >> * REST Service > >> > >> - Spring > >> > >> * Query > >> > >> - Antlr > >> - Apache Calcite (formerly Optiq) > >> - Linq4j > >> > >> * Job > >> > >> - Quartz > >> > >> * Web build tool > >> > >> - NPM > >> - Grunt > >> - bower > >> > >> * Web > >> > >> - Angular JS > >> - jQuery > >> - Bootstrap > >> - D3 JS > >> - ACE > >> > >> ##Cryptography > >> > >> Kylin will eventually support encryption on the wire. This is not one > >> of the initial goals, and we do not expect Kylin to be a controlled > >> export item due to the use of encryption. Kylin supports but does not > >> require the Kerberos authentication mechanism to access secured Hadoop > >> services. > >> > >> # Required Resources > >> > >> ## Mailing List > >> > >> - kylin-private for private PMC discussions (with moderated > subscriptions) > >> - kylin-dev > >> - kylin-commits > >> > >> ##Subversion Directory > >> > >> Git is the preferred source control system: git://git.apache.org/Kylin > >> > >> ## Issue Tracking > >> > >> JIRA Kylin (KYLIN) > >> > >> ## Other Resources > >> > >> The existing code already has unit tests so we will make use of > >> existing Apache continuous testing infrastructure. The resulting load > >> should not be very large. > >> > >> # Initial Committers > >> > >> - Jiang Xu < jiangxu.china at gmail dot com> > >> - Luke Han > >> - Yang Li > >> - George Song > >> - Hongbin Ma > >> - Xiaodong Duo < oranjedog at gmail dot com> > >> - Julian Hyde < jhyde at apache dot org > > >> - Ankur Bansal < abansal at ebay dot com> > >> > >> ## Affiliations > >> > >> The initial committers are employees of eBay Inc., Ctrip and > >> Hortonworks. The nominated mentors are employees of Hortonworks, MapR > >> Technologies and Pivotal. > >> > >> # Sponsors > >> > >> ## Champion > >> > >> - Owen O=E2=80=99Malley < omalley at apache dot org > > >> - Ted Dunning > >> > >> ## Nominated Mentors > >> > >> - Owen O=E2=80=99Malley < omalley at apache dot org > - Apache IPMC me= mber, > >> Co-founder and Senior Architect, Hortonworks > >> - Ted Dunning < tdunning at apache dot org> - Apache IPMC member, > >> Chief Architect, MapR Technologies > >> - Henry Saputra - Apache IPMC member, > Pivotal > >> - Jacques Nadeau (pending admission to > >> IPMC) - Apache Drill PMC Chair, MapR Technologies > >> > >> #Sponsoring Entity > >> > >> We are requesting the Incubator to sponsor this project. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > > For additional commands, e-mail: general-help@incubator.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > For additional commands, e-mail: general-help@incubator.apache.org > > --=20 Best Regards! --------------------- Luke Han --001a11c33f5e28686d0508461a8a--