Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 558FD4EFF for ; Wed, 8 Jun 2011 12:03:33 +0000 (UTC) Received: (qmail 50551 invoked by uid 500); 8 Jun 2011 12:03:32 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 50273 invoked by uid 500); 8 Jun 2011 12:03:32 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 50265 invoked by uid 99); 8 Jun 2011 12:03:32 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Jun 2011 12:03:32 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jvermillard@gmail.com designates 209.85.214.175 as permitted sender) Received: from [209.85.214.175] (HELO mail-iw0-f175.google.com) (209.85.214.175) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Jun 2011 12:03:26 +0000 Received: by iwn10 with SMTP id 10so360606iwn.6 for ; Wed, 08 Jun 2011 05:03:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type:content-transfer-encoding; bh=X1vY/Uxzq1wlSqnrwS6rZtvu3erp8fgPqnWwdm0HSvk=; b=aHugkW2igVd5NcqJ7QzaZE8DvWoNHFtjQqgemQO/r0X1gz6O0pIOnsZj+HLIUh0bFE 2M9haoLtKUZbqZmuJa7a28STf0PYtrCRY1jxNEr+iSjX3fWi9Eh/4z2Vft+L3cqBHTaC Jr36NtyRjwDQDuqMLgnHLLUh29OzjOd9KE97Q= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=epG0snYBl2q06W9UPhIyIVuTa4x617vbNUSpDE0LggaZeFCDzxJ3aCKfhZaUze578g ZR3XD+GOQQxtMsSCTj76UixQxvKiHSl4d/t5+5aO3VveV/6hU2DR3DN7ERTNSIw5//Y0 Jwk8uaQTiWzigkYElgmYtpyfHXo3Al7HaNjws= Received: by 10.42.239.138 with SMTP id kw10mr13041970icb.417.1307534585094; Wed, 08 Jun 2011 05:03:05 -0700 (PDT) MIME-Version: 1.0 Received: by 10.43.131.73 with HTTP; Wed, 8 Jun 2011 05:02:45 -0700 (PDT) In-Reply-To: References: From: Julien Vermillard Date: Wed, 8 Jun 2011 14:02:45 +0200 Message-ID: Subject: Re: [VOTE] Flume to join the Incubator. To: general@incubator.apache.org Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org +1 (binding) and good luck :) On Wed, Jun 8, 2011 at 1:22 PM, Sanjiva Weerawarana wrote: > +1 (non-binding). > > On Wed, Jun 8, 2011 at 10:08 AM, Jonathan Hsieh wrote: > >> Hi all, >> >> Since there have been no new conversations on this Flume [PROPOSAL] thre= ad, >> I'd like to call a vote. >> >> At the end of this mail, I've put a copy of the current proposal. =A0Her= e is >> a >> link to the document in the wiki: >> http://wiki.apache.org/incubator/FlumeProposal >> >> And here is a link to the discussion thread: >> http://www.mail-archive.com/general@incubator.apache.org/msg27722.html >> >> Please cast your votes: >> >> [ =A0] +1 Accept Flume for incubation >> [ =A0] +0 Indifferent to Flume incubation >> [ =A0] =A0-1 Reject Flume for incubation >> >> This vote will close 72 hours from now. >> >> Thanks, >> Jon. >> >> ---- >> >> =3D Flume - A Distributed Log Collection System =3D >> >> =3D=3D Abstract =3D=3D >> >> Flume is a distributed, reliable, and available system for efficiently >> collecting, aggregating, and moving large amounts of log data to scalabl= e >> data storage systems such as Apache Hadoop's HDFS. >> >> =3D=3D Proposal =3D=3D >> >> Flume is a distributed, reliable, and available system for efficiently >> collecting, aggregating, and moving large amounts of log data from many >> different sources to a centralized data store. Its main goal is to deliv= er >> data from applications to Hadoop=92s HDFS. =A0It has a simple and flexib= le >> architecture for transporting streaming event data via flume nodes to th= e >> data store. =A0It is robust and fault-tolerant with tunable reliability >> mechanisms that rely upon many failover and recovery mechanisms. The sys= tem >> is centrally configured and allows for intelligent dynamic management. I= t >> uses a simple extensible data model that allows for lightweight online >> analytic applications. =A0It provides a pluggable mechanism by which new >> sources, destinations, and analytic functions which can be integrated >> within >> a Flume pipeline. >> >> =3D=3D Background =3D=3D >> >> Flume was initially developed by Cloudera to enable reliable and simplif= ied >> collection of log information from many distributed sources. It was late= r >> open-sourced by Cloudera on GitHub as an Apache 2.0 licensed project in >> June >> 2010. During this time Flume has been formally released five times as >> versions 0.9.0 (June 2010), 0.9.1 (Aug 2010), 0.9.1u1 (Oct 2010), 0.9.2 >> (Nov >> 2010), and 0.9.3 (Feb 2011). =A0These releases are also distributed by >> Cloudera as source and binaries along with enhancements as part of Cloud= era >> Distribution including Apache Hadoop (CDH). >> >> =3D=3D Rationale =3D=3D >> >> Collecting log information in a data center in a timely, reliable, and >> efficient manner is a difficult challenge but important because when >> aggregated and analyzed, log information can yield valuable business >> insights. =A0 We believe that users and operators need a manageable >> systematic >> approach for log collection that simplifies the creation, the monitoring= , >> and the administration of reliable log data pipelines. =A0Oftentimes tod= ay, >> this collection is attempted by periodically shipping data in batches an= d >> by >> using potentially unreliable and inefficient ad-hoc methods. >> >> Log data is typically generated in various systems running within a data >> center that can range from a few machines to hundreds of machines. =A0In >> aggregate, the data acts like a large-volume continuous stream with >> contents >> that can have highly-varied format and highly-varied content. =A0The vol= ume >> and variety of raw log data makes Apache Hadoop's HDFS file system an id= eal >> storage location before the eventual analysis. =A0Unfortunately, HDFS ha= s >> limitations with regards to durability as well as scaling limitations wh= en >> handling a large number of low-bandwidth connections or small files. >> =A0Similar technical challenges are also suffered when attempting to wri= te >> data to other data storage services. >> >> Flume addresses these challenges by providing a reliable, scalable, >> manageable, and extensible solution. =A0It uses a streaming design for >> capturing and aggregating log information from varied sources in a >> distributed environment and has centralized management features for mini= mal >> configuration and management overhead. >> >> =3D=3D Initial Goals =3D=3D >> >> Flume is currently in its first major release with a considerable number= of >> enhancement requests, tasks, and issues recorded towards its future >> development. The initial goal of this project will be to continue to bui= ld >> community in the spirit of the "Apache Way", and to address the highly >> requested features and bug-fixes towards the next dot release. >> >> Some goals include: >> =A0* To stand up a sustaining Apache-based community around the Flume >> codebase. >> =A0* Implementing core functionality of a usable highly-available Flume >> master. >> =A0* Performance, usability, and robustness improvements. >> =A0* Improving the ability to monitor and diagnose problems as data is >> transported. >> =A0* Providing a centralized place for contributed connectors and relate= d >> projects. >> >> =3D Current Status =3D >> >> =3D=3D Meritocracy =3D=3D >> >> Flume was initially developed by Jonathan Hsieh in July 2009 along with >> development team at Cloudera. Developers external to Cloudera provided >> feedback, suggested features and fixes and implemented extensions of Flu= me. >> Cloudera engineering team has since maintained the project with Jonathan >> Hsieh, Henry Robinson, and Patrick Hunt dedicated towards its improvemen= t. >> Contributors to Flume and its connectors include developers from differe= nt >> companies and different parts of the world. >> >> =3D=3D Community =3D=3D >> >> Flume is currently used by a number of organizations all over the world. >> Flume has an active and growing user and developer community with active >> participation in [[ >> https://groups.google.com/a/cloudera.org/group/flume-user/topics|user]] >> and >> [[ >> https://groups.google.com/a/cloudera.org/group/flume-dev/topics|develope= r >> ]] >> mailing lists. =A0The users and developers also communicate via IRC on #= flume >> at irc.freenode.net. >> >> Since open sourcing the project, there have been over 15 different peopl= e >> from diverse organizations who have contributed code. During this period= , >> the project team has hosted open, in-person, quarterly meetups to discus= s >> new features, new designs, and new use-case stories. >> >> =3D=3D Core Developers =3D=3D >> >> The core developers for Flume project are: >> =A0* Andrew Bayer: Andrew has a lot of expertise with build tools, >> specifically Jenkins continuous integration and Maven. >> =A0* Jonathan Hsieh: Jonathan designed and implemented much of the origi= nal >> code. >> =A0* Patrick Hunt: Patrick has improved the web interfaces of Flume >> components >> and contributed several build quality =A0improvements. >> =A0* Bruce Mitchener: Bruce has improved the internal logging infrastruc= ture >> as well as edited significant portions of the Flume manual. >> =A0* Henry Robinson: Henry has implemented much of the ZooKeeper integra= tion, >> plugin mechanisms, as well as several Flume features and bug fixes. >> =A0* Eric Sammer: Eric has implemented the Maven build, as well as sever= al >> Flume features and bug fixes. >> >> All core developers of the Flume project have contributed towards Hadoop= or >> related Apache projects and are very familiar with Apache principals and >> philosophy for community driven software development. >> >> =3D=3D Alignment =3D=3D >> >> Flume complements Hadoop Map-Reduce, Pig, Hive, HBase by providing a rob= ust >> mechanism to allow log data integration from external systems for effect= ive >> analysis. =A0Its design enable efficient integration of newly ingested d= ata >> to >> Hive's data warehouse. >> >> Flume's architecture is open and easily extensible. =A0This has encourag= ed >> many users to contribute integrate plugins to other projects. =A0For exa= mple, >> several users have contributed connectors to message queuing and bus >> services, to several open source data stores, to incremental search >> indexes, >> and to a stream analysis engines. >> >> =3D Known Risks =3D >> >> =3D=3D Orphaned Products =3D=3D >> >> Flume is already deployed in production at multiple companies and they a= re >> actively participating in feature requests and user led discussions. Flu= me >> is getting traction with developers and thus the risks of it being orpha= ned >> are minimal. >> >> =3D=3D Inexperience with Open Source =3D=3D >> >> All code developed for Flume has is open sourced by Cloudera under Apach= e >> 2.0 license. =A0All committers of Flume project are intimately familiar = with >> the Apache model for open-source development and are experienced with >> working with new contributors. >> >> =3D=3D Homogeneous Developers =3D=3D >> >> The initial set of committers is from a reduced set of organizations. >> However, we expect that once approved for incubation, the project will >> attract new contributors from diverse organizations and will thus grow >> organically. The participation of developers from several different >> organizations in the mailing list is a strong indication for this >> assertion. >> >> =3D=3D Reliance on Salaried Developers =3D=3D >> >> It is expected that Flume will be developed on salaried and volunteer ti= me, >> although all of the initial developers will work on it mainly on salarie= d >> time. >> >> =3D=3D Relationships with Other Apache Products =3D=3D >> >> Flume depends upon other Apache Projects: Apache Hadoop, Apache Log4J, >> Apache ZooKeeper, Apache Thrift, Apache Avro, multiple Apache Commons >> components. Its build depends upon Apache Ant and Apache Maven. >> >> Flume users have created connectors that interact with several other Apa= che >> projects including Apache HBase and Apache Cassandra. >> >> Flume's functionality has some indirect or direct overlap with the >> functionality of Apache Chukwa but has several significant architectural >> diffferences. =A0Both systems can be used to collect log data to write t= o >> hdfs. =A0However, Chukwa's primary goals are the analytic and monitoring >> aspects of a Hadoop cluster. =A0Instead of focusing on analytics, Flume >> focuses primarily upon data transport and integration with a wide set of >> data sources and data destinations. =A0 Architecturally, Chukwa componen= ts >> are >> individually and statically configured. =A0It also depends upon Hadoop >> MapReduce for its core functionality. =A0In contrast, Flume's components= are >> dynamically and centrally configured and does not depend directly upon >> Hadoop MapReduce. =A0Furthermore, Flume provides a more general model fo= r >> handling data and enables integration with projects such as Apache Hive, >> data stores such as Apache HBase, Apache Cassandra and Voldemort, and >> several Apache Lucene-related projects. >> >> =3D=3D An Excessive Fascination with the Apache Brand =3D=3D >> >> We would like Flume to become an Apache project to further foster a heal= thy >> community of contributors and consumers around the project. =A0Since Flu= me >> directly interacts with many Apache Hadoop-related projects by solves an >> important problem of many Hadoop users, residing in the Apache Software >> Foundation will increase interaction with the larger community. >> >> =3D Documentation =3D >> >> =A0* All Flume documentation (User Guide, Developer Guide, Cookbook, and >> Windows Guide) is maintained within Flume sources and can be built >> directly. >> =A0* Cloudera provides documentation specific to its distribution of Flu= me >> at: >> http://archive.cloudera.com/cdh/3/flume/ >> =A0* Flume wiki at GitHub: https://github.com/cloudera/flume/wiki >> =A0* Flume jira at Cloudera: https://issues.cloudera.org/browse/flume >> >> =3D Initial Source =3D >> >> =A0* https://github.com/cloudera/flume/tree/ >> >> =3D=3D Source and Intellectual Property Submission Plan =3D=3D >> >> =A0* The initial source is already licensed under the Apache License, Ve= rsion >> 2.0. https://github.com/cloudera/flume/blob/master/LICENSE >> >> =3D=3D External Dependencies =3D=3D >> >> The required external dependencies are all Apache License or compatible >> licenses. Following components with non-Apache licenses are enumerated: >> >> =A0* org.arabidopsis.ahocorasick : BSD-style >> >> Non-Apache build tools that are used by Flume are as follows: >> >> =A0* AsciiDoc: GNU GPLv2 >> =A0* FindBugs: GNU LGPL >> =A0* Cobertura: GNU GPLv2 >> =A0* PMD : BSD-style >> >> =3D=3D Cryptography =3D=3D >> >> Flume uses standard APIs and tools for SSH and SSL communication where >> necessary. >> >> =3D Required =A0Resources =3D >> >> =3D=3D Mailing lists =3D=3D >> >> =A0* flume-private (with moderated subscriptions) >> =A0* flume-dev >> =A0* flume-commits >> =A0* flume-user >> >> =3D=3D Subversion Directory =3D=3D >> >> https://svn.apache.org/repos/asf/incubator/flume >> >> =3D=3D Issue Tracking =3D=3D >> >> JIRA Flume (FLUME) >> >> =3D=3D Other Resources =3D=3D >> >> The existing code already has unit and integration tests so we would lik= e a >> Jenkins instance to run them whenever a new patch is submitted. This can= be >> added after project creation. >> >> =3D Initial Committers =3D >> >> =A0* Andrew Bayer (abayer at cloudera dot com) >> =A0* Jonathan Hsieh (jon at cloudera dot com) >> =A0* Patrick Hunt (phunt at cloudera dot com) >> =A0* Aaron Kimball (akimball83 at gmail dot com) >> =A0* Bruce Mitchener (bruce.mitchener at gmail dot com) >> =A0* Arvind Prabhakar (arvind at cloudera dot com) >> =A0* Ahmed Radwan (ahmed at cloudera dot com) >> =A0* Henry Robinson (henry at cloudera dot com) >> =A0* Eric Sammer (esammer at cloudera dot com) >> =A0* Derek Deeter (ddeeterctrb at gmail dot com) >> >> =3D Affiliations =3D >> >> =A0* Andrew Bayer, Cloudera >> =A0* Jonathan Hsieh, Cloudera >> =A0* Patrick Hunt, Cloudera >> =A0* Aaron Kimball, Odiago >> =A0* Bruce Mitchener, Independent >> =A0* Arvind Prabhakar, Cloudera >> =A0* Ahmed Radwan, Cloudera >> =A0* Henry Robinson, Cloudera >> =A0* Eric Sammer, Cloudera >> =A0* Derek Deeter, Intuit >> >> >> =3D Sponsors =3D >> >> =3D=3D Champion =3D=3D >> >> =A0* Nigel Daley >> >> =3D=3D Nominated Mentors =3D=3D >> >> =A0* Tom White >> =A0* Nigel Daley >> =A0* Ralph Goers >> =A0* Patrick Hunt >> >> =3D=3D Sponsoring Entity =3D=3D >> >> =A0* Apache Incubator PMC >> >> >> -- >> // Jonathan Hsieh (shay) >> // Software Engineer, Cloudera >> // jon@cloudera.com >> > > > > -- > Sanjiva Weerawarana, Ph.D. > Founder, Director & Chief Scientist; Lanka Software Foundation; > http://www.opensource.lk/ > Founder, Chairman & CEO; WSO2; http://wso2.com/ > Founder & Director; Thinkcube Systems; http://www.thinkcube.com/ > Member; Apache Software Foundation; http://www.apache.org/ > Visiting Lecturer; University of Moratuwa; http://www.cse.mrt.ac.lk/ > > Blog: http://sanjiva.weerawarana.org/ > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org For additional commands, e-mail: general-help@incubator.apache.org