Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A6AB4F2FA for ; Fri, 14 Nov 2014 16:29:41 +0000 (UTC) Received: (qmail 19542 invoked by uid 500); 14 Nov 2014 16:29:41 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 19362 invoked by uid 500); 14 Nov 2014 16:29:41 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 19345 invoked by uid 99); 14 Nov 2014 16:29:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Nov 2014 16:29:40 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of luke.hq@gmail.com designates 209.85.217.171 as permitted sender) Received: from [209.85.217.171] (HELO mail-lb0-f171.google.com) (209.85.217.171) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Nov 2014 16:29:13 +0000 Received: by mail-lb0-f171.google.com with SMTP id b6so12967946lbj.2 for ; Fri, 14 Nov 2014 08:28:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=pUhh0C2gdsuSVOMAp1bC+biWaHvIR6Bca3QK84HOuu0=; b=NlMr8MSN/2P1TdNBSm2QoorcfBvx7dtgMHGd9XbsoWdxmlz7/8zL72QI7lnWsGtroH wWqR1nCmbvmBvIzB0AWNS/QvvxgMAMW8idZQcPMOBDW+uxm9XHnfsuJI6ZytxhNTfNTY IkSKnjRwtJrkLDRjFjIPXFc/tFoAjyXDIY368/1N9yFsGprQhdHNTj3sWb9I/+x6PooU zyk4P0SwS3rBoL3nXuryo59T8nudCpc0wnlMgJq1ehH4YlZxVw4XGGCphTJypQaUNTR9 cg0odsvgM//zhoSNpA0VfrYJPJs+qhiSE0HJwvGd+zQSYz3veVG+lbNQV5Sjtiwk0TKq nzEA== MIME-Version: 1.0 X-Received: by 10.112.144.228 with SMTP id sp4mr8965077lbb.58.1415982507665; Fri, 14 Nov 2014 08:28:27 -0800 (PST) Received: by 10.112.17.168 with HTTP; Fri, 14 Nov 2014 08:28:27 -0800 (PST) In-Reply-To: <696f905bb8fc431e9f9a1054951ccce9@BY2PR03MB490.namprd03.prod.outlook.com> References: <696f905bb8fc431e9f9a1054951ccce9@BY2PR03MB490.namprd03.prod.outlook.com> Date: Sat, 15 Nov 2014 00:28:27 +0800 Message-ID: Subject: Re: [PROPOSAL] Kylin for Incubation From: Luke Han To: general@incubator.apache.org Content-Type: multipart/alternative; boundary=047d7b3a8f5a60b2d30507d41e91 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b3a8f5a60b2d30507d41e91 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Check again with Apache trademark is a more safe way to continue use this name. Will contact them and do the check again. Thank you very much to point this out. Luke 2014-11-15 0:01 GMT+08:00 Ross Gardler (MS OPEN TECH) < Ross.Gardler@microsoft.com>: > Please check with VP Trademarks here at Apache. > > Sent from my Windows Phone > ________________________________ > From: Luke Han > Sent: =E2=80=8E11/=E2=80=8E14/=E2=80=8E2014 8:00 AM > To: general@incubator.apache.org > Subject: Re: [PROPOSAL] Kylin for Incubation > > We have noticed this from the beginning, below is the comments from our > Legal team: > "We=E2=80=99ve done a preliminary trademark search for Kylin in the US, a= nd there > weren=E2=80=99t any directly conflicting brands. " > > I think it should be ok to use:) > > Thanks. > > Luke > > 2014-11-14 23:47 GMT+08:00 Ross Gardler (MS OPEN TECH) < > Ross.Gardler@microsoft.com>: > > > Potential trademark clash: http://www.ubuntu.com/desktop/ubuntu-kylin > > > > Sent from my Windows Phone > > ________________________________ > > From: Luke Han > > Sent: =E2=80=8E11/=E2=80=8E14/=E2=80=8E2014 7:38 AM > > To: general@incubator.apache.org > > Subject: [PROPOSAL] Kylin for Incubation > > > > Hi all, > > We would like to propose Kylin as an Apache Incubator project. The > > complete proposal can be found: > > https://wiki.apache.org/incubator/KylinProposal and posted the text of > > the proposal below. > > > > Thanks. > > Luke > > > > > > Kylin Proposal > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > # Abstract > > > > Kylin is a distributed and scalable OLAP engine built on Hadoop to > > support extremely large datasets. > > > > # Proposal > > > > Kylin is an open source Distributed Analytics Engine that provides > > multi-dimensional analysis (MOLAP) on Hadoop. Kylin is designed to > > accelerate analytics on Hadoop by allowing the use of SQL-compatible > > tools. Kylin provides a SQL interface and multi-dimensional analysis > > (MOLAP) on Hadoop to support extremely large datasets and tightly > > integrate with Hadoop ecosystem. > > > > ## Overview of Kylin > > > > Kylin platform has two parts of data processing and interactive: > > First, Kylin will read data from source, Hive, and run a set of tasks > > including Map Reduce job, shell script to pre-calcuate results for a > > specified data model, then save the resulting OLAP cube into storage > > such as HBase. Once these OLAP cubes are ready, a user can submit a > > request from any SQL-based tool or third party applications to Kylin=E2= =80=99s > > REST server. The Server calls the Query Engine to determine if the > > target dataset already exists. If so, the engine directly accesses the > > target data in the form of a predefined cube, and returns the result > > with sub-second latency. Otherwise, the engine is designed to route > > non-matching queries to whichever SQL on Hadoop tool is already > > available on a Hadoop cluster, such as Hive. > > > > Kylin platform includes: > > > > - Metadata Manager: Kylin is a metadata-driven application. The Kylin > > Metadata Manager is the key component that manages all metadata stored > > in Kylin including all cube metadata. All other components rely on the > > Metadata Manager. > > > > - Job Engine: This engine is designed to handle all of the offline > > jobs including shell script, Java API, and Map Reduce jobs. The Job > > Engine manages and coordinates all of the jobs in Kylin to make sure > > each job executes and handles failures. > > > > - Storage Engine: This engine manages the underlying storage =E2=80=93 > > specifically, the cuboids, which are stored as key-value pairs. The > > Storage Engine uses HBase =E2=80=93 the best solution from the Hadoop > > ecosystem for leveraging an existing K-V system. Kylin can also be > > extended to support other K-V systems, such as Redis. > > > > - Query Engine: Once the cube is ready, the Query Engine can receive > > and parse user queries. It then interacts with other components to > > return the results to the user. > > > > - REST Server: The REST Server is an entry point for applications to > > develop against Kylin. Applications can submit queries, get results, > > trigger cube build jobs, get metadata, get user privileges, and so on. > > > > - ODBC Driver: To support third-party tools and applications =E2=80=93 = such as > > Tableau =E2=80=93 we have built and open-sourced an ODBC Driver. The go= al is > > to make it easy for users to onboard. > > > > # Background > > > > The challenge we face at eBay is that our data volume is becoming > > bigger and bigger while our user base is becoming more diverse. For > > e.g. our business users and analysts consistently ask for minimal > > latency when visualizing data on Tableau and Excel. So, we worked > > closely with our internal analyst community and outlined the product > > requirements for Kylin: > > > > - Sub-second query latency on billions of rows > > - ANSI SQL availability for those using SQL-compatible tools > > - Full OLAP capability to offer advanced functionality > > - Support for high cardinality and very large dimensions > > - High concurrency for thousands of users > > - Distributed and scale-out architecture for analysis in the TB to PB > size > > range > > > > Existing SQL-on-Hadoop solutions commonly need to perform partial or > > full table or file scans to compute the results of queries. The cost > > of these large data scans can make many queries very slow (more than a > > minute). The core idea of MOLAP (multi-dimensional OLAP) is to > > pre-compute data along dimensions of interest and store resulting > > aggregates as a "cube". MOLAP is much faster but is inflexible. We > > realized that no existing product met our exact requirements > > externally =E2=80=93 especially in the open source Hadoop community. To= meet > > our emerging business needs, we built a platform from scratch to > > support MOLAP for these business requirements and then to support more > > others include ROLAP. With an excellent development team and several > > pilot customers, we have been able to bring the Kylin platform into > > production as well as open source it. > > > > # Rationale > > > > When data grows to petabyte scale, the process of pre-calculation of a > > query takes a long time and costly and powerful hardware. However, > > with the benefit of Hadoop=E2=80=99s distributed computing architecture= , jobs > > can leverage hundreds or thousands of Hadoop data nodes. There still > > exists a big gap between the growing volume of data and interactive > > analytics: > > > > - Existing Business Intelligence (OLAP) platforms cannot scale out to > > support fast growing data. > > - Existing SQL on Hadoop projects are not designed for OLAP use cases, > > huge tables joins will always take long time to scan and calculate. > > - No mature OLAP solution exists on Hadoop > > > > As mentioned in the background, the business requirements triggered by > > increase in data volume drove eBay to invest in building a solution > > from scratch to offer Analytics capability on Hadoop cluster. With > > Hadoop=E2=80=99s power of distributed computing Kylin can perform > > pre-calculations in parallel and merge the final results, thereby > > significantly reducing the processing time. > > > > To serve queries by the analyst community, Kylin generates cuboids > > with all possible combinations of dimensions, and calculate all > > metrics at different levels. The cuboids are then integrated to form a > > pre-calculated OLAP cube. All cuboids are key-value structured: keys > > are composites formed from combinations of multiple dimensions and > > values are aggregations results for that particular combination of > > dimensions. Kylin uses HBase to store cubes. HBase is useful because > > it supports efficient searches across ranges of data. > > > > # Current Status > > > > ## Meritocracy > > > > Kylin has been deployed in production at eBay and is processing > > extremely large datasets. The platform has demonstrated great > > performance benefits and has proved to be a better way for analysts to > > leverage data on Hadoop with a more convenient approach using their > > favorite tool. > > > > ## Community > > > > Kylin seeks to develop developer and user communities during incubation= . > > > > ## Core Developers > > > > Kylin is currently being designed and developed by six engineers from > > eBay Inc. =E2=80=93 Jiang Xu, Luke Han, Yang Li, George Song, Hongbin M= a and > > Xiaodong Duo. In addition, some outside contributors are actively > > contributing in design and development. Among them, Julian Hyde from > > Hortonworks is a very important contributor. All of these core > > developers have deep expertise in Hadoop and the Hadoop Ecosystem in > > general. > > > > ## Alignment > > > > The ASF is a natural host for Kylin given that it is already the home > > of Hadoop, Pig, Hive, and other emerging cloud software projects. > > Kylin was designed to offer OLAP capability on Hadoop from the > > beginning in order to solve data access and analysis challenges in > > Hadoop clusters. Kylin complements the existing Hadoop analytics area > > by providing a comprehensive solution based on pre-computed views. > > > > In Kylin, we are leveraging an open-source dynamic data management > > framework called Apache Calcite to parse SQL and plug in our code. > > Apache Calcite was previously called Optiq, was originally authored by > > Julian Hyde and is now an Apache Incubator project. > > > > # Known Risks > > > > ## Orphaned Products > > > > The core developers of Kylin team plan to work full time on this > > project. There is very little risk of Kylin getting orphaned since at > > least one large company (eBay) is extensively using it in their > > production Hadoop clusters. For example, currently there are 3 use > > cases with more that 12+Billion rows and 1000 activity requests per > > day using Kylin in production. Furthermore, since Kylin was open > > sourced at the beginning of October 2014, it has received more than > > 280 stars and been forked nearly 100 times. Kylin has one major > > release so far and and received 5 pull requests from contributors in > > the first month pull requests from external sources in the last month, > > which further demonstrates Kylin as a very active project. We plan to > > extend and diversify this community further through Apache. > > > > ## Inexperience with Open Source > > > > The core developers are all active users and followers of open source. > > They are already committers and contributors to the Kylin Github > > project. All have been involved with the source code that has been > > released under an open source license, and several of them also have > > experience developing code in an open source environment. Though the > > core set of Developers do not have Apache Open Source experience, > > there are plans to onboard individuals with Apache open source > > experience on to the project. > > > > ## Homogenous Developers > > > > The core developers include developers from eBay, Ctrip and > > Hortonworks. Apache Incubation process encourages an open and diverse > > meritocratic community. Apache Kylin has the required amount of > > diversity with committers from three different organizations, but is > > also aware that bulk of the commits come from a single entity. Kylin > > intends to make every possible effort to build a diverse, vibrant and > > involved community and has already received substantial interest from > > various organizations > > > > ## Reliance on Salaried Developers > > > > eBay invested in Kylin as the OLAP solution on top of Hadoop clusters > > and some of its key engineers are working full time on the project. In > > addition, since there is a growing Big Data need for scalable OLAP > > solutions on Hadoop, we look forward to other Apache developers and > > researchers to contribute to the project. Additional contributors, > > including Apache committers have plans to join this effort shortly. > > Also key to addressing the risk associated with relying on Salaried > > developers from a single entity is to increase the diversity of the > > contributors and actively lobby for Domain experts in the BI space to > > contribute. Apache Kylin intends to do this. One approach already > > taken is to approach the Apache Drill project to explore possible > > cooperation. > > > > ## Relationships with Other Apache Products > > > > Kylin has a strong relationship and dependency with Apache Hadoop > > HBase, Hive and Calcite. Being part of Apache=E2=80=99s Incubation comm= unity, > > could help with a closer collaboration among these four projects and > > as well as others. > > > > Kylin is likely to have substantial value to Apache Drill due to the > > common use of Calcite as a query optimization engine and similar > > approaches between Kylin's approach to cubing and Drill's approach to > > input sources. > > > > ## An Excessive Fascination with the Apache Brand > > > > Kylin is proposing to enter incubation at Apache in order to help > > efforts to diversify the committer-base, not so much to capitalize on > > the Apache brand. The Kylin project is in production use already > > inside EBay, but is not expected to be an EBay product for external > > customers. As such, the Kylin project is not seeking to use the Apache > > brand as a marketing tool. > > > > # Documentation > > > > Information about Kylin can be found at > > https://github.com/KylinOLAP/Kylin. The following links provide more > > information about Kylin in open source: > > > > - Kylin web site: http://kylin.io > > - Codebase at Github: https://github.com/KylinOLAP/Kylin > > - Issue Tracking: https://github.com/KylinOLAP/Kylin/issues > > - User community: https://groups.google.com/forum/#!forum/kylin-olap > > > > ## Initial Source > > > > Kylin has been under development since 2013 by a team of engineers at > > eBay Inc. It is currently hosted on Github.com under an Apache license > > at https://github.com/KylinOLAP/Kylin > > > > ## External Dependencies > > > > Kylin has the following external dependencies. > > > > * Basic > > > > - JDK 1.6+ > > - Apache Maven > > - JUnit > > - DBUnit > > - Log4j > > - Slf4j > > - Apache Commons > > - Google Guava > > - Jackson > > > > * Hadoop > > > > - Apache Hadoop > > - Apache HBase > > - Apache Hive > > - Apache Zookeeper > > - Apache Curator > > > > * Utility > > > > - H2 > > - JSCH > > > > * REST Service > > > > - Spring > > > > * Query > > > > - Antlr > > - Apache Calcite (formerly Optiq) > > - Linq4j > > > > * Job > > > > - Quartz > > > > * Web build tool > > > > - NPM > > - Grunt > > - bower > > > > * Web > > > > - Angular JS > > - jQuery > > - Bootstrap > > - D3 JS > > - ACE > > > > ##Cryptography > > > > Kylin will eventually support encryption on the wire. This is not one > > of the initial goals, and we do not expect Kylin to be a controlled > > export item due to the use of encryption. Kylin supports but does not > > require the Kerberos authentication mechanism to access secured Hadoop > > services. > > > > # Required Resources > > > > ## Mailing List > > > > - kylin-private for private PMC discussions (with moderated > subscriptions) > > - kylin-dev > > - kylin-commits > > > > ##Subversion Directory > > > > Git is the preferred source control system: git://git.apache.org/Kylin > > > > ## Issue Tracking > > > > JIRA Kylin (KYLIN) > > > > ## Other Resources > > > > The existing code already has unit tests so we will make use of > > existing Apache continuous testing infrastructure. The resulting load > > should not be very large. > > > > # Initial Committers > > > > - Jiang Xu < jiangxu.china at gmail dot com> > > - Luke Han > > - Yang Li > > - George Song > > - Hongbin Ma > > - Xiaodong Duo < oranjedog at gmail dot com> > > - Julian Hyde < jhyde at apache dot org > > > - Ankur Bansal < abansal at ebay dot com> > > > > ## Affiliations > > > > The initial committers are employees of eBay Inc., Ctrip and > > Hortonworks. The nominated mentors are employees of Hortonworks, MapR > > Technologies and Pivotal. > > > > # Sponsors > > > > ## Champion > > > > - Owen O=E2=80=99Malley < omalley at apache dot org > > > - Ted Dunning > > > > ## Nominated Mentors > > > > - Owen O=E2=80=99Malley < omalley at apache dot org > - Apache IPMC mem= ber, > > Co-founder and Senior Architect, Hortonworks > > - Ted Dunning < tdunning at apache dot org> - Apache IPMC member, > > Chief Architect, MapR Technologies > > - Henry Saputra - Apache IPMC member, > Pivotal > > - Jacques Nadeau (pending admission to > > IPMC) - Apache Drill PMC Chair, MapR Technologies > > > > #Sponsoring Entity > > > > We are requesting the Incubator to sponsor this project. > > > > > > -- > > Best Regards! > --------------------- > > Luke Han > --=20 Best Regards! --------------------- Luke Han --047d7b3a8f5a60b2d30507d41e91--