Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7F9096C0C for ; Sat, 23 Jul 2011 02:57:40 +0000 (UTC) Received: (qmail 48866 invoked by uid 500); 23 Jul 2011 02:57:39 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 48426 invoked by uid 500); 23 Jul 2011 02:57:36 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 48413 invoked by uid 99); 23 Jul 2011 02:57:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 23 Jul 2011 02:57:34 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of paliwalashish@gmail.com designates 209.85.220.175 as permitted sender) Received: from [209.85.220.175] (HELO mail-vx0-f175.google.com) (209.85.220.175) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 23 Jul 2011 02:57:28 +0000 Received: by vxh2 with SMTP id 2so2186350vxh.6 for ; Fri, 22 Jul 2011 19:57:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=IE/V6UGyjHukzdIWSkkw/ngYTnp7s+JMhCd+cKQyGNI=; b=le2i/At/GYuzkfL/JXetriTg6nwP4qdvymNsNP/707KYNZvV5N7f+FhwqKWsQ6Twhz lDkmUAeKMatlAZpXn8whvtV2xcSy6BrbYlrWj904ySwEBxgiTM1qY7/sjBitQQ6WQd8g /2sWZM+wr4pRxfvCNdFEaeiTMWa/6c/dTaDgE= MIME-Version: 1.0 Received: by 10.52.98.198 with SMTP id ek6mr2069755vdb.240.1311389826900; Fri, 22 Jul 2011 19:57:06 -0700 (PDT) Received: by 10.52.106.137 with HTTP; Fri, 22 Jul 2011 19:57:06 -0700 (PDT) In-Reply-To: References: Date: Sat, 23 Jul 2011 08:27:06 +0530 Message-ID: Subject: Re: [VOTE] Giraph to join the incubator From: Ashish To: general@incubator.apache.org Content-Type: multipart/alternative; boundary=20cf307abdebcb641b04a8b3bc3a X-Virus-Checked: Checked by ClamAV on apache.org --20cf307abdebcb641b04a8b3bc3a Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable +1 On Sat, Jul 23, 2011 at 2:00 AM, Avery Ching wrote: > Hi and good friday to you all, > > It's been a week since we submitted our proposal for Giraph's inclusion > into the Apache incubator and the discussion around the proposal seems to > have settled. Thank you for all the comments/questions/general interest = and > for those who volunteered to be committers. At this time, I'd like to as= k > for a vote. > > The latest proposal can be found at the end of this email and in the > following wiki: > > http://wiki.apache.org/incubator/GiraphProposal > > The discussion regarding > the proposal can be found below: > > http://www.mail-archive.com/general@incubator.apache.org/msg29957.html > > > Please cast your votes: > > [ ] +1 Accept Giraph for incubation > [ ] +0 Indifferent to Giraph incubation > [ ] -1 Reject Giraph for incubation > > This vote will close 72 hours from now. > > Thanks! > > Avery > > > =3D Giraph : Large-scale graph processing on Hadoop =3D > > =3D=3D Abstract =3D=3D > > Giraph is a large-scale, fault-tolerant, Bulk Synchronous Parallel > (BSP)-based graph processing framework. > > =3D=3D Proposal =3D=3D > > Graph processing platforms to run large-scale algorithms (such as page > rank, shared connections, personalization-based popularity, etc.) have > become quite popular. Some recent examples include Pregel and HaLoop. F= or > general-purpose big data computation, the MapReduce computation model is > widely adopted and the most deployed MapReduce infrastructure is Apache > Hadoop. We have implemented a graph-processing framework that is launche= d > as a typical Hadoop MapReduce job to leverage existing Hadoop > infrastructure, such as Amazon=92s EC2. Giraph builds upon the graph-ori= ented > nature of Pregel but additionally adds fault-tolerance to the coordinator > process with the use of ZooKeeper as its centralized coordination service= . > Additionally, Giraph will include a library of generic graph algorithms. > > =3D=3D Background =3D=3D > > Giraph was initially began development as a side project at Yahoo! at the > end of 2010. It was made functional in a month and then started adding > various features. Development has been focused on internal customers nee= ds > until this point. > > =3D=3D Rationale =3D=3D > > Web and online social graphs have been rapidly growing in size and scale > during the past decade. In 2008, Google estimated that the number of web > pages reached over a trillion. Online social networking and email sites, > including Yahoo!, Google, Microsoft, Facebook, LinkedIn, and Twitter, hav= e > hundreds of millions of users and are expected to grow much more in the > future. Processing these graphs plays a big role in relevant and > personalized information for users, such as results from a search engine = or > news in an online social networking site. > > =3D=3D Initial Goals =3D=3D > > At this point, most of the functionality has been implemented and we are > looking to get more adoption and contributions from users outside Yahoo!. > We want to ensure that performance scales and that the code is robust and > fault tolerant. > > =3D=3D Current Status =3D=3D > > =3D=3D=3D Meritocracy =3D=3D=3D > > Giraph was initially developed by Avery Ching and Christian Kunz beginnin= g > in December 2010 at Yahoo!. There are other developers using Giraph at > Yahoo! that are making suggestions and adding code. We are reaching out = to > other folks at social networking companies for additional usage and > development. > > =3D=3D=3D Community =3D=3D=3D > > Several groups who are interested in either joining our project or using > our code have contacted us. We certainly believe that there is a lot of > interest and are actively looking to improve and expand the community. > > =3D=3D=3D Core Developers =3D=3D=3D > > * Avery Ching: Wrote a majority of the code > * Christian Kunz: Wrote most of the communication code and security > integration with Hadoop > > =3D=3D=3D Alignment =3D=3D=3D > > Giraph uses several Apache projects as its underlying infrastructure > (Hadoop and ZooKeeper). It also builds on Apache Maven. > > =3D=3D Known Risks =3D=3D > > =3D=3D=3D Orphaned products =3D=3D=3D > > There are many social networking companies that would be interested in > using this graph-processing framework and we have already received intere= st > from some of them. Yahoo! is already using this code in production and w= ill > certainly continue to use it in the future as well. > > =3D=3D=3D Inexperience with Open Source =3D=3D=3D > > While the initial developers have limited experience on contributing to > open-source projects, Yahoo! as a company has a strong commitment to > open-source and we have several advisors that we can ask for help. > > =3D=3D=3D Homogenous Developers =3D=3D=3D > > At this time, the project is relatively young and the developers work at > only two companies (Yahoo! and Jybe). However, given the interest we hav= e > seen in the project, we expect the diversity to improve in the near futur= e. > > =3D=3D=3D Reliance on Salaried Developers =3D=3D=3D > > Currently Giraph is being developed by a combination of salaried and > volunteer time. We expect that other corporations will take an interest = in > this project and likely contribute with salaried developers. Some > individuals will likely spend volunteer time on it as well. It is still > early in their project and we are hoping for a lot of growth. > > =3D=3D=3D Relationships with Other Apache Products =3D=3D=3D > > Giraph depends on many Apache projects: Hadoop, ZooKeeper, Log4j, Commons= , > etc. It is built using Apache Maven. > > Giraph has some overlapping functionality with Apache Hama. However, the= re > are some significant differences. Giraph focuses on graph-based bulk > synchronous parallel (BSP) computing, while Apache Hama is more for gener= al > purposed BSP computing. Giraph runs on the Hadoop infrastructure, while > Apache Hama uses its own computing framework. > > =3D=3D=3D An Excessive Fascination with the Apache Brand =3D=3D=3D > > The Apache brand is likely to help us find contributors, however, our > interests in Apache are primarily because the other projects that we depe= nd > on are also Apache projects and it makes sense that all this software be > available from the same place. > > =3D=3D=3D Documentation =3D=3D=3D > > Currently we have little documentation, but several examples. We are > working on improving this situation. > > =3D=3D=3D Initial Source =3D=3D=3D > > The initial source of the code is from Yahoo! and began development in > December 2010. It is already available on GitHub at > https://github.com/aching/Giraph. > > =3D=3D=3D Source and Intellectual Property Submission Plan =3D=3D=3D > > We intend the entire code base to be licensed under the Apache License, > Version 2.0. > > =3D=3D=3D External Dependencies =3D=3D=3D > > The required dependencies are all Apache compatible licenses. The > following components with non-Apache licenses are enumerated: > * JSON =96 Public Domain > > =3D=3D=3D Cryptography =3D=3D=3D > > Giraph depends on secure Hadoop that can optionally use Kerberos. > > =3D=3D Required Resources =3D=3D > > =3D=3D=3D Mailing lists =3D=3D=3D > > * giraph-private (with moderated subscriptions) > * giraph-dev > * giraph-commits > * giraph-users > > =3D=3D=3D Subversion Directory =3D=3D=3D > > https://svn.apache.org/repos/asf/incubator/giraph > > =3D=3D=3D Issue Tracking =3D=3D=3D > > JIRA Giraph (GIRAPH) > > =3D=3D=3D Other Resources =3D=3D=3D > > Giraph has integration tests that can be run with the LocalJobRunner. > These same tests also designed to be run on a small (even single node) > Hadoop cluster. While not required at this time, it would be nice if suc= h a > resource were available. > > =3D=3D=3D Initial Committers =3D=3D=3D > > * Avery Ching, aching at yahoo-inc dot com > * Christian Kunz, christian at jybe-inc dot com > * Owen O=92Malley, owen at hortonworks dot com > * Phillip Rhodes, prhodes at apache dot org > * Hyunsik Choi, hyunsik at apache dot org > * Jakob Homan, jghoman at apache dot org > * Arun Suresh, asuresh at yahoo-inc dot com > > =3D=3D=3D Affiliations =3D=3D=3D > > * Avery Ching, Yahoo! > * Christian Kunz, Jybe > * Owen O'Malley, Hortonworks > * Phillip Rhodes, Fogbeam Labs > * Hyunsik Choi, Database Lab, Korea University > * Jakob Homan, LinkedIn > * Arun Suresh, Yahoo! > > =3D=3D Sponsors =3D=3D > > =3D=3D=3D Champion =3D=3D=3D > > Owen O=92 Malley > > =3D=3D=3D Nominated Mentors =3D=3D=3D > > Owen O=92Malley > > =3D=3D=3D Sponsoring Entity =3D=3D=3D > > Apache Incubator PMC > > --=20 thanks ashish Blog: http://www.ashishpaliwal.com/blog My Photo Galleries: http://www.pbase.com/ashishpaliwal --20cf307abdebcb641b04a8b3bc3a--