Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 900494115 for ; Wed, 8 Jun 2011 16:34:12 +0000 (UTC) Received: (qmail 36999 invoked by uid 500); 8 Jun 2011 16:34:11 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 36838 invoked by uid 500); 8 Jun 2011 16:34:11 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 36829 invoked by uid 99); 8 Jun 2011 16:34:11 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Jun 2011 16:34:11 +0000 Received: from localhost (HELO mail-ww0-f43.google.com) (127.0.0.1) (smtp-auth username tomwhite, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Jun 2011 16:34:11 +0000 Received: by wwb17 with SMTP id 17so663776wwb.0 for ; Wed, 08 Jun 2011 09:34:09 -0700 (PDT) Received: by 10.216.142.165 with SMTP id i37mr7379384wej.106.1307550849446; Wed, 08 Jun 2011 09:34:09 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.3.21 with HTTP; Wed, 8 Jun 2011 09:33:10 -0700 (PDT) In-Reply-To: References: From: Tom White Date: Wed, 8 Jun 2011 09:33:10 -0700 Message-ID: Subject: Re: [VOTE] Flume to join the Incubator. To: general@incubator.apache.org Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable +1 Tom On Tue, Jun 7, 2011 at 9:38 PM, Jonathan Hsieh wrote: > Hi all, > > Since there have been no new conversations on this Flume [PROPOSAL] threa= d, > I'd like to call a vote. > > At the end of this mail, I've put a copy of the current proposal. =A0Here= is a > link to the document in the wiki: > http://wiki.apache.org/incubator/FlumeProposal > > And here is a link to the discussion thread: > http://www.mail-archive.com/general@incubator.apache.org/msg27722.html > > Please cast your votes: > > [ =A0] +1 Accept Flume for incubation > [ =A0] +0 Indifferent to Flume incubation > [ =A0] =A0-1 Reject Flume for incubation > > This vote will close 72 hours from now. > > Thanks, > Jon. > > ---- > > =3D Flume - A Distributed Log Collection System =3D > > =3D=3D Abstract =3D=3D > > Flume is a distributed, reliable, and available system for efficiently > collecting, aggregating, and moving large amounts of log data to scalable > data storage systems such as Apache Hadoop's HDFS. > > =3D=3D Proposal =3D=3D > > Flume is a distributed, reliable, and available system for efficiently > collecting, aggregating, and moving large amounts of log data from many > different sources to a centralized data store. Its main goal is to delive= r > data from applications to Hadoop=92s HDFS. =A0It has a simple and flexibl= e > architecture for transporting streaming event data via flume nodes to the > data store. =A0It is robust and fault-tolerant with tunable reliability > mechanisms that rely upon many failover and recovery mechanisms. The syst= em > is centrally configured and allows for intelligent dynamic management. It > uses a simple extensible data model that allows for lightweight online > analytic applications. =A0It provides a pluggable mechanism by which new > sources, destinations, and analytic functions which can be integrated wit= hin > a Flume pipeline. > > =3D=3D Background =3D=3D > > Flume was initially developed by Cloudera to enable reliable and simplifi= ed > collection of log information from many distributed sources. It was later > open-sourced by Cloudera on GitHub as an Apache 2.0 licensed project in J= une > 2010. During this time Flume has been formally released five times as > versions 0.9.0 (June 2010), 0.9.1 (Aug 2010), 0.9.1u1 (Oct 2010), 0.9.2 (= Nov > 2010), and 0.9.3 (Feb 2011). =A0These releases are also distributed by > Cloudera as source and binaries along with enhancements as part of Cloude= ra > Distribution including Apache Hadoop (CDH). > > =3D=3D Rationale =3D=3D > > Collecting log information in a data center in a timely, reliable, and > efficient manner is a difficult challenge but important because when > aggregated and analyzed, log information can yield valuable business > insights. =A0 We believe that users and operators need a manageable syste= matic > approach for log collection that simplifies the creation, the monitoring, > and the administration of reliable log data pipelines. =A0Oftentimes toda= y, > this collection is attempted by periodically shipping data in batches and= by > using potentially unreliable and inefficient ad-hoc methods. > > Log data is typically generated in various systems running within a data > center that can range from a few machines to hundreds of machines. =A0In > aggregate, the data acts like a large-volume continuous stream with conte= nts > that can have highly-varied format and highly-varied content. =A0The volu= me > and variety of raw log data makes Apache Hadoop's HDFS file system an ide= al > storage location before the eventual analysis. =A0Unfortunately, HDFS has > limitations with regards to durability as well as scaling limitations whe= n > handling a large number of low-bandwidth connections or small files. > =A0Similar technical challenges are also suffered when attempting to writ= e > data to other data storage services. > > Flume addresses these challenges by providing a reliable, scalable, > manageable, and extensible solution. =A0It uses a streaming design for > capturing and aggregating log information from varied sources in a > distributed environment and has centralized management features for minim= al > configuration and management overhead. > > =3D=3D Initial Goals =3D=3D > > Flume is currently in its first major release with a considerable number = of > enhancement requests, tasks, and issues recorded towards its future > development. The initial goal of this project will be to continue to buil= d > community in the spirit of the "Apache Way", and to address the highly > requested features and bug-fixes towards the next dot release. > > Some goals include: > =A0* To stand up a sustaining Apache-based community around the Flume > codebase. > =A0* Implementing core functionality of a usable highly-available Flume > master. > =A0* Performance, usability, and robustness improvements. > =A0* Improving the ability to monitor and diagnose problems as data is > transported. > =A0* Providing a centralized place for contributed connectors and related > projects. > > =3D Current Status =3D > > =3D=3D Meritocracy =3D=3D > > Flume was initially developed by Jonathan Hsieh in July 2009 along with > development team at Cloudera. Developers external to Cloudera provided > feedback, suggested features and fixes and implemented extensions of Flum= e. > Cloudera engineering team has since maintained the project with Jonathan > Hsieh, Henry Robinson, and Patrick Hunt dedicated towards its improvement= . > Contributors to Flume and its connectors include developers from differen= t > companies and different parts of the world. > > =3D=3D Community =3D=3D > > Flume is currently used by a number of organizations all over the world. > Flume has an active and growing user and developer community with active > participation in [[ > https://groups.google.com/a/cloudera.org/group/flume-user/topics|user]] a= nd > [[https://groups.google.com/a/cloudera.org/group/flume-dev/topics|develop= er]] > mailing lists. =A0The users and developers also communicate via IRC on #f= lume > at irc.freenode.net. > > Since open sourcing the project, there have been over 15 different people > from diverse organizations who have contributed code. During this period, > the project team has hosted open, in-person, quarterly meetups to discuss > new features, new designs, and new use-case stories. > > =3D=3D Core Developers =3D=3D > > The core developers for Flume project are: > =A0* Andrew Bayer: Andrew has a lot of expertise with build tools, > specifically Jenkins continuous integration and Maven. > =A0* Jonathan Hsieh: Jonathan designed and implemented much of the origin= al > code. > =A0* Patrick Hunt: Patrick has improved the web interfaces of Flume compo= nents > and contributed several build quality =A0improvements. > =A0* Bruce Mitchener: Bruce has improved the internal logging infrastruct= ure > as well as edited significant portions of the Flume manual. > =A0* Henry Robinson: Henry has implemented much of the ZooKeeper integrat= ion, > plugin mechanisms, as well as several Flume features and bug fixes. > =A0* Eric Sammer: Eric has implemented the Maven build, as well as severa= l > Flume features and bug fixes. > > All core developers of the Flume project have contributed towards Hadoop = or > related Apache projects and are very familiar with Apache principals and > philosophy for community driven software development. > > =3D=3D Alignment =3D=3D > > Flume complements Hadoop Map-Reduce, Pig, Hive, HBase by providing a robu= st > mechanism to allow log data integration from external systems for effecti= ve > analysis. =A0Its design enable efficient integration of newly ingested da= ta to > Hive's data warehouse. > > Flume's architecture is open and easily extensible. =A0This has encourage= d > many users to contribute integrate plugins to other projects. =A0For exam= ple, > several users have contributed connectors to message queuing and bus > services, to several open source data stores, to incremental search index= es, > and to a stream analysis engines. > > =3D Known Risks =3D > > =3D=3D Orphaned Products =3D=3D > > Flume is already deployed in production at multiple companies and they ar= e > actively participating in feature requests and user led discussions. Flum= e > is getting traction with developers and thus the risks of it being orphan= ed > are minimal. > > =3D=3D Inexperience with Open Source =3D=3D > > All code developed for Flume has is open sourced by Cloudera under Apache > 2.0 license. =A0All committers of Flume project are intimately familiar w= ith > the Apache model for open-source development and are experienced with > working with new contributors. > > =3D=3D Homogeneous Developers =3D=3D > > The initial set of committers is from a reduced set of organizations. > However, we expect that once approved for incubation, the project will > attract new contributors from diverse organizations and will thus grow > organically. The participation of developers from several different > organizations in the mailing list is a strong indication for this asserti= on. > > =3D=3D Reliance on Salaried Developers =3D=3D > > It is expected that Flume will be developed on salaried and volunteer tim= e, > although all of the initial developers will work on it mainly on salaried > time. > > =3D=3D Relationships with Other Apache Products =3D=3D > > Flume depends upon other Apache Projects: Apache Hadoop, Apache Log4J, > Apache ZooKeeper, Apache Thrift, Apache Avro, multiple Apache Commons > components. Its build depends upon Apache Ant and Apache Maven. > > Flume users have created connectors that interact with several other Apac= he > projects including Apache HBase and Apache Cassandra. > > Flume's functionality has some indirect or direct overlap with the > functionality of Apache Chukwa but has several significant architectural > diffferences. =A0Both systems can be used to collect log data to write to > hdfs. =A0However, Chukwa's primary goals are the analytic and monitoring > aspects of a Hadoop cluster. =A0Instead of focusing on analytics, Flume > focuses primarily upon data transport and integration with a wide set of > data sources and data destinations. =A0 Architecturally, Chukwa component= s are > individually and statically configured. =A0It also depends upon Hadoop > MapReduce for its core functionality. =A0In contrast, Flume's components = are > dynamically and centrally configured and does not depend directly upon > Hadoop MapReduce. =A0Furthermore, Flume provides a more general model for > handling data and enables integration with projects such as Apache Hive, > data stores such as Apache HBase, Apache Cassandra and Voldemort, and > several Apache Lucene-related projects. > > =3D=3D An Excessive Fascination with the Apache Brand =3D=3D > > We would like Flume to become an Apache project to further foster a healt= hy > community of contributors and consumers around the project. =A0Since Flum= e > directly interacts with many Apache Hadoop-related projects by solves an > important problem of many Hadoop users, residing in the Apache Software > Foundation will increase interaction with the larger community. > > =3D Documentation =3D > > =A0* All Flume documentation (User Guide, Developer Guide, Cookbook, and > Windows Guide) is maintained within Flume sources and can be built direct= ly. > =A0* Cloudera provides documentation specific to its distribution of Flum= e at: > http://archive.cloudera.com/cdh/3/flume/ > =A0* Flume wiki at GitHub: https://github.com/cloudera/flume/wiki > =A0* Flume jira at Cloudera: https://issues.cloudera.org/browse/flume > > =3D Initial Source =3D > > =A0* https://github.com/cloudera/flume/tree/ > > =3D=3D Source and Intellectual Property Submission Plan =3D=3D > > =A0* The initial source is already licensed under the Apache License, Ver= sion > 2.0. https://github.com/cloudera/flume/blob/master/LICENSE > > =3D=3D External Dependencies =3D=3D > > The required external dependencies are all Apache License or compatible > licenses. Following components with non-Apache licenses are enumerated: > > =A0* org.arabidopsis.ahocorasick : BSD-style > > Non-Apache build tools that are used by Flume are as follows: > > =A0* AsciiDoc: GNU GPLv2 > =A0* FindBugs: GNU LGPL > =A0* Cobertura: GNU GPLv2 > =A0* PMD : BSD-style > > =3D=3D Cryptography =3D=3D > > Flume uses standard APIs and tools for SSH and SSL communication where > necessary. > > =3D Required =A0Resources =3D > > =3D=3D Mailing lists =3D=3D > > =A0* flume-private (with moderated subscriptions) > =A0* flume-dev > =A0* flume-commits > =A0* flume-user > > =3D=3D Subversion Directory =3D=3D > > https://svn.apache.org/repos/asf/incubator/flume > > =3D=3D Issue Tracking =3D=3D > > JIRA Flume (FLUME) > > =3D=3D Other Resources =3D=3D > > The existing code already has unit and integration tests so we would like= a > Jenkins instance to run them whenever a new patch is submitted. This can = be > added after project creation. > > =3D Initial Committers =3D > > =A0* Andrew Bayer (abayer at cloudera dot com) > =A0* Jonathan Hsieh (jon at cloudera dot com) > =A0* Patrick Hunt (phunt at cloudera dot com) > =A0* Aaron Kimball (akimball83 at gmail dot com) > =A0* Bruce Mitchener (bruce.mitchener at gmail dot com) > =A0* Arvind Prabhakar (arvind at cloudera dot com) > =A0* Ahmed Radwan (ahmed at cloudera dot com) > =A0* Henry Robinson (henry at cloudera dot com) > =A0* Eric Sammer (esammer at cloudera dot com) > =A0* Derek Deeter (ddeeterctrb at gmail dot com) > > =3D Affiliations =3D > > =A0* Andrew Bayer, Cloudera > =A0* Jonathan Hsieh, Cloudera > =A0* Patrick Hunt, Cloudera > =A0* Aaron Kimball, Odiago > =A0* Bruce Mitchener, Independent > =A0* Arvind Prabhakar, Cloudera > =A0* Ahmed Radwan, Cloudera > =A0* Henry Robinson, Cloudera > =A0* Eric Sammer, Cloudera > =A0* Derek Deeter, Intuit > > > =3D Sponsors =3D > > =3D=3D Champion =3D=3D > > =A0* Nigel Daley > > =3D=3D Nominated Mentors =3D=3D > > =A0* Tom White > =A0* Nigel Daley > =A0* Ralph Goers > =A0* Patrick Hunt > > =3D=3D Sponsoring Entity =3D=3D > > =A0* Apache Incubator PMC > > > -- > // Jonathan Hsieh (shay) > // Software Engineer, Cloudera > // jon@cloudera.com > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org For additional commands, e-mail: general-help@incubator.apache.org