incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Craig Russell <apache....@gmail.com>
Subject [RESULT] [VOTE] Accept Crail into the Apache Incubator
Date Wed, 01 Nov 2017 14:40:15 GMT
Subject line change to close the vote.

> On Nov 1, 2017, at 6:42 AM, Luciano Resende <luckbr1975@gmail.com> wrote:
> 
> On Thu, Oct 26, 2017 at 8:31 AM, Luciano Resende <luckbr1975@gmail.com>
> wrote:
> 
>> Now that the discussion thread on the Crail proposal has ended, please
>> vote on accepting Crail into into the Apache Incubator.
>> 
>> The ASF voting rules are described at:
>>   http://www.apache.org/foundation/voting.html
>> 
>> A vote for accepting a new Apache Incubator podling is a majority vote
>> for which only Incubator PMC member votes are binding.
>> 
>> Votes from other people are also welcome as an indication of peoples
>> enthusiasm (or lack thereof).
>> 
>> Please do not use this VOTE thread for discussions.
>> If needed, start a new thread instead.
>> 
>> This vote will run for at least 72 hours. Please VOTE as follows
>> [] +1 Accept Crail into the Apache Incubator
>> [] +0 Abstain.
>> [] -1 Do not accept Crail into the Apache Incubator because ...
>> 
>> The proposal below is also on the wiki:
>> https://wiki.apache.org/incubator/CrailProposal
>> 
>> ===
>> 
>> Abstract
>> 
>> Crail is a storage platform for sharing performance critical data in
>> distributed data processing jobs at very high speed. Crail is built
>> entirely upon principles of user-level I/O and specifically targets data
>> center deployments with fast network and storage hardware (e.g., 100Gbps
>> RDMA, plenty of DRAM, NVMe flash, etc.) as well as new modes of operation
>> such resource disaggregation or serverless computing. Crail is written in
>> Java and integrates seamlessly with the Apache data processing ecosystem.
>> It can be used as a backbone to accelerate high-level data operations such
>> as shuffle or broadcast, or as a cache to store hot data that is queried
>> repeatedly, or as a storage platform for sharing inter-job data in complex
>> multi-job pipelines, etc.
>> 
>> Proposal
>> 
>> Crail enables Apache data processing frameworks to run efficiently in next
>> generation data centers using fast storage and network hardware in
>> combination with resource (e.g., DRAM, Flash) disaggregation.
>> 
>> Background
>> 
>> Crail started as a research project at the IBM Zurich Research Laboratory
>> around 2014 aiming to integrate high-speed I/O hardware effectively into
>> large scale data processing systems.
>> 
>> Rational
>> 
>> During the last decade, I/O hardware has undergone rapid performance
>> improvements, typically in the order of magnitudes. Modern day networking
>> and storage hardware can deliver 100+ Gbps (10+ GBps) bandwidth with a few
>> microseconds of access latencies. However, despite such progress in raw I/O
>> performance, effectively leveraging modern hardware in data processing
>> frameworks remains challenging. In most of the cases, upgrading to high-end
>> networking or storage hardware has very little effect on the performance of
>> analytics workloads. The problem comes from heavily layered software
>> imposing overheads such as deep call stacks, unnecessary data copies,
>> thread contention, etc. These problems have already been addressed at the
>> operating system level with new I/O APIs such as RDMA verbs, NVMe, etc.,
>> allowing applications to bypass software layers during I/O operations.
>> Distributed data processing frameworks on the other hand, are typically
>> implemented on legacy I/O interfaces such as such as sockets or block
>> storage. These interfaces have been shown to be insufficient to deliver the
>> full hardware performance. Yet, to the best of our knowledge, there are no
>> active and systematic efforts to integrate these new user level I/O APIs
>> into Apache software frameworks. This problem affects all end-users and
>> organizations that use Apache software. We expect them to see
>> unsatisfactory small performance gains when upgrading their networking and
>> storage hardware.
>> 
>> Crail solves this problem by providing an efficient storage platform built
>> upon user-level I/O, thus, bypassing layers such as JVM and OS during I/O
>> operations. Moreover, Crail directly leverages the specific hardware
>> features of RDMA and NVMe to provide a better integration with high-level
>> data operations in Apache compute frameworks. As a consequence, Crail
>> enables users to run larger, more complex queries against ever increasing
>> amounts of data at a speed largely determined by the deployed hardware.
>> Crail is generic solution that integrates well with the Apache ecosystem
>> including frameworks like Spark, Hadoop, Hive, etc.
>> 
>> Initial Goals
>> 
>> The initial goals to move Crail to the Apache Incubator is to broaden the
>> community, and foster contributions from developers to leverage Crail in
>> various data processing frameworks and workloads. Ultimately, the goal for
>> Crail is to become the de-facto standard platform for storing temporary
>> performance critical data in distributed data processing systems.
>> 
>> Current Status
>> 
>> The initial code has been developed at the IBM Zurich Research Center and
>> has recently been made available in GitHub under the Apache Software
>> License 2.0. The Project currently has explicit support for Spark and
>> Hadoop. Project documentation is available on the website www.crail.io.
>> There is also a public forum for discussions related to Crail available at
>> https://groups.google.com/forum/#!forum/zrlio-users.
>> 
>> Mericrotacy
>> 
>> The current developers are familiar with the meritocratic open source
>> development process at Apache. Over the last year, the project has gathered
>> interest at GitHub and several companies have already expressed interest in
>> the project. We plan to invest in supporting a meritocracy by inviting
>> additional developers to participate.
>> 
>> Community
>> 
>> The need for a generic solution to integrate high-performance I/O hardware
>> in the open source is tremendous, so there is a potential for a very large
>> community. We believe that Crail’s extensible architecture and its
>> alignment with the Apache Ecosystem will further encourage community
>> participation. We expect that over time Crail will attract a large
>> community.
>> 
>> Alignment
>> 
>> Crail is written in Java and is built for the Apache data processing
>> ecosystem. The basic storage services of Crail can be used seamlessly from
>> Spark, Hadoop, Storm. The enhanced storage services require dedicated data
>> processing specific binding, which currently are available only for Spark.
>> We think that moving Crail to the Apache incubator will help to extend
>> Crail’s support for different data processing frameworks.
>> 
>> Known Risks
>> 
>> To-date, development has been sponsored by IBM and coordinated mostly by
>> the core team of researchers at the IBM Zurich Research Center. For Crail
>> to fully transition to an "Apache Way" governance model, it needs to start
>> embracing the meritocracy-centric way of growing the community of
>> contributors.
>> 
>> Orphaned Products
>> 
>> The Crail developers have a long-term interest in use and maintenance of
>> the code and there is also hope that growing a diverse community around the
>> project will become a guarantee against the project becoming orphaned. We
>> feel that it is also important to put formal governance in place both for
>> the project and the contributors as the project expands. We feel ASF is the
>> best location for this.
>> 
>> Inexperience with Open Source
>> 
>> Several of the initial committers are experienced open source developers
>> (Linux Kernel, DPDK, etc.).
>> 
>> Relationships with Other Apache Products
>> 
>> As of now, Crail has been tested with Spark, Hadoop and Hive, but it is
>> designed to integrate with any of the Apache data processing frameworks.
>> 
>> Homogeneous Developers
>> 
>> The project already has a diverse developer base including contributions
>> from organizations and public developers.
>> 
>> An Excessive Fascination with the Apache Brand
>> 
>> Crail solves a real need for a generic approach to leverage modern network
>> and storage hardware effectively in the Apache Hadoop and Spark ecosystems.
>> Our rationale for developing Crail as an Apache project is detailed in the
>> Rationale section. We believe that the Apache brand and community process
>> will help to us to engage a larger community and facilitate closer ties
>> with various Apache data processing projects.
>> 
>> Documentation
>> 
>> Documentation regarding Crail is available at www.crail.io
>> 
>> Initial Source
>> 
>> Initial source is available on GitHub under the Apache License 2.0:
>> 
>> https://github.com/zrlio/crail
>> External Dependencies
>> 
>> Crail is written in Java and currently supports Apache Hadoop MapReduce
>> and Apache Spark runtimes. To the best of our knowledge, all dependencies
>> of Crail are distributed under Apache compatible licenses.
>> 
>> Required Resource
>> 
>> Mailing lists
>> 
>> private@crail.incubator.apache.org
>> dev@crail.incubator.apache.org
>> commits@crail.incubator.apache.org
>> Git repository
>> 
>> https://git-wip-us.apache.org/repos/asf/incubator-crail.git
>> Issue Tracking
>> 
>> JIRA (Crail)
>> Initial Committers
>> 
>> Patrick Stuedi <stu AT ibm DOT zurich DOT com>
>> Animesh Trivedi <atr AT ibm DOT zurich DOT com>
>> Jonas Pfefferle <jpf AT ibm DOT zurich DOT com>
>> Bernard Metzler <bmt AT ibm DOT zurich DOT com>
>> Michael Kaufmann <kau AT ibm DOT zurich DOT com>
>> Adrian Schuepbach <dri AT ibm DOT zurich DOT com>
>> Patrick McArthur <patrick AT patrickmcarthur DOT net>
>> Ana Klimovic <anakli AT stanford DOT edu>
>> Yuval Degani <yuvaldeg AT mellanox DOT com>
>> Vu Pham <vuhuong AT mellanox DOT com>
>> Affiliations
>> 
>> IBM (Patrick, Stuedi, Animesh Trivedi, Jonas Pfefferle, Bernard Metzler,
>> Michael Kaufmann, Adrian Schuepbach)
>> University of New Hampshire (Patrick McArthur)
>> Stanford University (Ana Klimovic)
>> Mellanox (Yuval Degani, Vu Pham)
>> Sponsors
>> 
>> Champion
>> 
>> Luciano Resende <lresende AT apache DOT org>
>> 
>> Nominated Mentors
>> 
>> Luciano Resende <lresende AT apache DOT org>
>> 
>> Raphael Bircher <rbircher AT apache DOT org>
>> 
>> Julian Hyde <jhyde AT apache DOT org>
>> 
>> Sponsoring Entity
>> 
>> We would like to propose the Apache Incubator to sponsor this project.
>> 
>> 
>> 
> 
> The vote has passed with 5 binding + 1 from:
> 
> Luciano Resende
> Julian Hyde
> Raphael Bircher
> Willem Jiang
> Dave Fisher
> 
> And 5 non-binding +1 from
> 
> Clebert Suconic
> Gang(Gary) Wang
> Debo Dutta (dedutta)
> Kacie Karo
> Pierre Smits
> 
> Thanks and Welcome to the Apache Incubator.
> 
> -- 
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/

Craig L Russell
Secretary, Apache Software Foundation
clr@apache.org http://db.apache.org/jdo


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message