incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henry Saputra <henry.sapu...@gmail.com>
Subject Re: [PROPOSAL] Proposing Giraph for the Apache Incubator
Date Fri, 22 Jul 2011 17:36:36 GMT
Sounds good to me. Thanks for your reply Avery.

- Henry

On Thu, Jul 21, 2011 at 4:39 PM, Avery Ching <aching@yahoo-inc.com> wrote:
> Henry,
>
> While we haven't begun too much work on a generic library, the intent is to provide generic
vertex input/output formats, aggregators, combiners, and graph computations that make it very
easy for a user to get started right away.  None of these need to be explicitly integrated
with Hadoop or Hadoop objects.  That being said, we provide users the ability to use existing
Hadoop Writable implementations, such as IntWritable, FloatWritable, etc. to make their lives
easier rather than reimplementing those basic types.  Similarly, the methods of VertexInputFormat/VertexOutputFormat
need not be implemented using an underlying Hadoop InputFormat/OutputFormat, but they are
similar to make it easy to do so if desired.
>
> Hope that answers your question,
>
> Avery
>
> On Jul 21, 2011, at 4:09 PM, Henry Saputra wrote:
>
>> Will the library generic graph algorithm be tightly coupled with the
>> Hadoop integration piece?
>>
>> - Henry
>>
>> On Fri, Jul 15, 2011 at 11:14 AM, Avery Ching <aching@yahoo-inc.com> wrote:
>>> Hi,
>>>
>>> I would like to propose Giraph as an Apache Incubator project.  Giraph is a
large-scale graph processing infrastructure (inspired by Pregel) that runs entirely on Hadoop.
 Giraph applications and MapReduce jobs coexist on shared Hadoop instances and Giraph applications
can be part of Oozie workflows as a normal MapReduce job.
>>>
>>> Here is a link to the proposal in our GitHub wiki:
>>>
>>> https://github.com/aching/Giraph/wiki/Apache-Incubator-Proposal
>>>
>>> The proposal is also inlined below:
>>>
>>> Thanks!
>>>
>>> Avery
>>>
>>>
>>>
>>> = Giraph : Large-scale graph processing on Hadoop =
>>>
>>> == Abstract ==
>>>
>>> Giraph is a large-scale, fault-tolerant, Bulk Synchronous Parallel (BSP)-based
graph processing framework.
>>>
>>> == Proposal ==
>>>
>>> Graph processing platforms to run large-scale algorithms (such as page rank,
shared connections, personalization-based popularity, etc.) have become quite popular.  Some
recent examples include Pregel and HaLoop.  For general-purpose big data computation, the
MapReduce computation model is widely adopted and the most deployed MapReduce infrastructure
is Apache Hadoop.  We have implemented a graph-processing framework that is launched as a
typical Hadoop MapReduce job to leverage existing Hadoop infrastructure, such as Amazon’s
EC2.  Giraph builds upon the graph-oriented nature of Pregel but additionally adds fault-tolerance
to the coordinator process with the use of ZooKeeper as its centralized coordination service.
 Additionally, Giraph will include a library of generic graph algorithms.
>>>
>>> == Background ==
>>>
>>> Giraph was initially began development as a side project at Yahoo! at the end
of 2010.  It was made functional in a month and then started adding various features.  Development
has been focused on internal customers needs until this point.
>>>
>>> == Rationale ==
>>>
>>> Web and online social graphs have been rapidly growing in size and scale during
the past decade.  In 2008, Google estimated that the number of web pages reached over a trillion.
 Online social networking and email sites, including Yahoo!, Google, Microsoft, Facebook,
LinkedIn, and Twitter, have hundreds of millions of users and are expected to grow much more
in the future.  Processing these graphs plays a big role in relevant and personalized information
for users, such as results from a search engine or news in an online social networking site.
>>>
>>> == Initial Goals ==
>>>
>>> At this point, most of the functionality has been implemented and we are looking
to get more adoption and contributions from users outside Yahoo!.   We want to ensure that
performance scales and that the code is robust and fault tolerant.
>>>
>>> == Current Status ==
>>>
>>> === Meritocracy ===
>>>
>>> Giraph was initially developed by Avery Ching and Christian Kunz beginning in
December 2010 at Yahoo!.  There are other developers using Giraph at Yahoo! that are making
suggestions and adding code.  We are reaching out to other folks at social networking companies
for additional usage and development.
>>>
>>> === Community ===
>>>
>>> Several groups who are interested in either joining our project or using our
code have contacted us.  We certainly believe that there is a lot of interest and are actively
looking to improve and expand the community.
>>>
>>> === Core Developers ===
>>>
>>> Avery Ching: Wrote a majority of the code
>>> Christian Kunz: Wrote most of the communication code and security integration
with Hadoop
>>>
>>> === Alignment ===
>>>
>>> Giraph uses several Apache projects as its underlying infrastructure (Hadoop
and ZooKeeper).   It also builds on Apache Maven.
>>>
>>> == Known Risks ==
>>>
>>> === Orphaned products ===
>>>
>>> There are many social networking companies that would be interested in using
this graph-processing framework and we have already received interest from some of them.  Yahoo!
is already using this code in production and will certainly continue to use it in the future
as well.
>>>
>>> === Inexperience with Open Source ===
>>>
>>> While the initial developers have limited experience on contributing to open-source
projects, Yahoo! as a company has a strong commitment to open-source and we have several advisors
that we can ask for help.
>>>
>>> === Homogenous Developers ===
>>>
>>> At this time, the project is relatively young and the developers work at only
two companies (Yahoo! and Jybe).  However, given the interest we have seen in the project,
we expect the diversity to improve in the near future.
>>>
>>> === Reliance on Salaried Developers ===
>>>
>>> Currently Giraph is being developed by a combination of salaried and volunteer
time.  We expect that other corporations will take an interest in this project and likely
contribute with salaried developers.  Some individuals will likely spend volunteer time on
it as well.  It is still early in their project and we are hoping for a lot of growth.
>>>
>>> === Relationships with Other Apache Products ===
>>>
>>> Giraph depends on many Apache projects: Hadoop, ZooKeeper, Log4j, Commons, etc.
 It is built using Apache Maven.
>>>
>>> Giraph has some overlapping functionality with Apache Hama.  However, there
are some significant differences.  Giraph focuses on graph-based bulk synchronous parallel
(BSP) computing, while Apache Hama is more for general purposed BSP computing.  Giraph runs
on the Hadoop infrastructure, while Apache Hama uses its own computing framework.
>>>
>>> === An Excessive Fascination with the Apache Brand ===
>>>
>>> The Apache brand is likely to help us find contributors, however, our interests
in Apache are primarily because the other projects that we depend on are also Apache projects
and it makes sense that all this software be available from the same place.
>>>
>>> === Documentation ===
>>>
>>> Currently we have little documentation, but several examples.  We are working
on improving this situation.
>>>
>>> === Initial Source ===
>>>
>>> The initial source of the code is from Yahoo! and began development in December
2010.  It is already available on GitHub at https://github.com/aching/Giraph.
>>>
>>> === Source and Intellectual Property Submission Plan ===
>>>
>>> We intend the entire code base to be licensed under the Apache License, Version
2.0.
>>>
>>> === External Dependencies ===
>>>
>>> The required dependencies are all Apache compatible licenses.  The following
components with non-Apache licenses are enumerated:
>>> * JSON – Public Domain
>>>
>>> === Cryptography ===
>>>
>>> Giraph depends on secure Hadoop that can optionally use Kerberos.
>>>
>>> == Required Resources ==
>>>
>>> === Mailing lists ===
>>>
>>> * giraph-private (with moderated subscriptions)
>>> * giraph-dev
>>> * giraph-commits
>>> * giraph-users
>>>
>>> === Subversion Directory ===
>>>
>>> https://svn.apache.org/repos/asf/incubator/giraph
>>>
>>> === Issue Tracking ===
>>>
>>> JIRA Giraph (GIRAPH)
>>>
>>> === Other Resources ===
>>>
>>> Giraph has integration tests that can be run with the LocalJobRunner.  These
same tests also designed to be run on a small (even single node) Hadoop cluster.  While not
required at this time, it would be nice if such a resource were available.
>>>
>>> === Initial Committers ===
>>>
>>> Avery Ching, aching at yahoo-inc dot com
>>> Christian Kunz, christian at jybe-inc dot com
>>> Owen O’Malley, owen at hortonworks dot com
>>>
>>> === Affiliations ===
>>>
>>> Avery Ching, Yahoo!
>>> Christian Kunz, Jybe
>>>
>>> == Sponsors ==
>>>
>>> === Champion ===
>>>
>>> Owen O’ Malley
>>>
>>> === Nominated Mentors ===
>>>
>>> Owen O’Malley
>>>
>>> === Sponsoring Entity ===
>>>
>>> Apache Incubator PMC
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message