Mailing-List: contact hama-user-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hama-user@incubator.apache.org
MIME-Version: 1.0
In-Reply-To: <4C2E0EEC.3030205@tis.bz.it>
References: <AANLkTim6crQZkY9VxX0mkdnTECVyTLPNKowzDYZfWoIC@mail.gmail.com>
	<AANLkTinLZVGspiAM2zNhUdANKCbSH5SCUOn6aypUkz2z@mail.gmail.com>
	<AANLkTilnPqai1KG_ZObNlhmjkiNKlmL9rj1VW9t40sN5@mail.gmail.com>
	<4C2DCF64.1010507@tis.bz.it>
	<756a97.15822.1299329fcf3.Coremail.zercal@126.com>
	<4C2E0EEC.3030205@tis.bz.it>
Date: Mon, 5 Jul 2010 17:22:13 +0900
Message-ID: <AANLkTim39p5dQvZqxtcfJ2gP4UVS71_hwqtNB4Z7xSLs@mail.gmail.com>
Subject: Re: Pregel article
From: "Edward J. Yoon" <edwardyoon@apache.org>
To: hama-user@incubator.apache.org
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Wow, Thanks for sharing nice comment and blog post!! I'll look at it
soon and try to join to this discussion. :)

On Sat, Jul 3, 2010 at 1:08 AM, Claudio Martella
<claudio.martella@tis.bz.it> wrote:
> I'll try to make things clear.
>
> The end of a superstep is obtained by the un-activation of all the
> vertices. In pregel the superstep is over when all the vertices call
> VoteToHalt() (in hama it's done by the sync() method). This happens at
> the end of each computation by each vertex. Each vertex is activated by
> the arrival of a message directed to that vertex. This means that each
> superstep computation is atomic and it should be considered in the
> design of the algorithm. That's a change of paradigm and that's what
> pregel author's call the "vertex's perspective programming".
>
> So no, there's no assumption about all the vertices to be active at the
> beginning of each superstep.
>
> About Felix's argument: yes, fewer supersteps mean less communication
> and synchronization overhead. At the same time, having longer supersteps
> will mean that it's more probable that certain vertices end their
> computation earlier than others, making them idle for a long time
> (waiting for the others to finish), and loosing computational time. So
> ideally it should be a good balance between long computational
> supersteps (decreasing communication overhead) and short computational
> supersteps (decreasing idle time).
> This is an intrinsical problem of BSP models because of the barrier. On
> the contrary DataFlow models don't have barriers and each computation is
> more independent, therefore more similar to the model you have in mind.
>
> Hope this helps.
>
> I attach the text from my blog post (roughly obtained with html2text) as
> requested.
>
> Cheers,
>
> Claudio
>
>
> zercal wrote:
>> The paper I found about pregel are not very detailed,
>> is it "http://portal.acm.org/citation.cfm?id=3D1582716.1582723"?
>> I guess, in this paper, vertices are assumed to be all actived at every =
superstep. simply random
>> access will reduce some communication cost but take more superstep.
>> However, is there any way of vertice selection method can be performed
>> that at every super step, each vertex knows whether to active according =
to
>> information it kept and received from other vertex?
>> But I can't found more detail from that paper...
>> Becides, I can not access your blogs. Would you please send me your arti=
cle?
>>
>> Thank you very much!
>> from Xiong Chenyan
>>
>> =C3=94=C3=9A2010-07-02 20:04:40=C2=A3=C2=AC"Felix Halim" <felix.halim@gm=
ail.com> =C3=90=C2=B4=C2=B5=C3=80=C2=A3=C2=BA
>>
>>> Exactly how to activate a particular vertex is not clear from the
>>> paper (is it random access?) and this feature is probably not as good
>>> as it sounds for complex graph algorithms. It might be better off to
>>> assume all vertices are active (to reduce the overhead of the flag
>>> needed and the space to make it randomly accessible, by storing it in
>>> blocks or whatever).
>>>
>>> Here is my argument:
>>>
>>> The way Pregel (and existing MR) works is iterative, where each
>>> iteration is separated by a super-step barrier where all messages have
>>> to arrive. Algorithms that have fewer super-steps are preferable than
>>> those that have large number of super-steps. In fact, we should
>>> measure Algorithms in terms of the number of super-steps required. To
>>> minimize the number of super-steps, likely we need to activate as much
>>> vertices as possible to do all the work in current super-step, rather
>>> than spill-over to the next super-step. In this case, the feature to
>>> "turn off" vertices is useless, since most of the time all vertices
>>> will be active to effectively reduce the number of super-steps.
>>>
>>> Unfortunately, I don't have experiments to backup my argument... I
>>> don't have Pregel...
>>>
>>> Felix Halim
>>>
>>>
>>> On Fri, Jul 2, 2010 at 7:37 PM, Claudio Martella
>>> <claudio.martella@tis.bz.it> wrote:
>>>
>>>> I did too. See:
>>>>
>>>> http://blog.acaro.org/entry/pregel-is-out-but-what-is-pregel
>>>>
>>>>
>>>> Felix Halim wrote:
>>>>
>>>>> I have. See my comment in this blog:
>>>>>
>>>>> http://blog.udanax.org/2010/06/summary-of-google-pregel.html
>>>>>
>>>>> Felix Halim
>>>>>
>>>>>
>>>>> On Tue, Jun 8, 2010 at 4:00 AM, Mark Kerzner <markkerzner@gmail.com> =
wrote:
>>>>>
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> anybody has read it?
>>>>>>
>>>>>> Thank you,
>>>>>> Mark
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>> --
>>>> Claudio Martella
>>>> Digital Technologies
>>>> Unit Research & Development - Analyst
>>>>
>>>> TIS innovation park
>>>> Via Siemens 19 | Siemensstr. 19
>>>> 39100 Bolzano | 39100 Bozen
>>>> Tel. +39 0471 068 123
>>>> Fax =C2=A0+39 0471 068 129
>>>> claudio.martella@tis.bz.it http://www.tis.bz.it
>>>>
>>>> Short information regarding use of personal data. According to Section=
 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you th=
at we process your personal data in order to fulfil contractual and fiscal =
obligations and also to send you information regarding our services and eve=
nts. Your personal data are processed with and without electronic means and=
 by respecting data subjects' rights, fundamental freedoms and dignity, par=
ticularly with regard to confidentiality, personal identity and the right t=
o personal data protection. At any time and without formalities you can wri=
te an e-mail to privacy@tis.bz.it in order to object the processing of your=
 personal data for the purpose of sending advertising materials and also to=
 exercise the right to access personal data and other rights referred to in=
 Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation=
 Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete infor=
mation on the web site www.tis.bz.it.
>>>>
>>>>
>>>>
>>>>
>
>
> --
> Claudio Martella
> Digital Technologies
> Unit Research & Development - Analyst
>
> TIS innovation park
> Via Siemens 19 | Siemensstr. 19
> 39100 Bolzano | 39100 Bozen
> Tel. +39 0471 068 123
> Fax =C2=A0+39 0471 068 129
> claudio.martella@tis.bz.it http://www.tis.bz.it
>
> Short information regarding use of personal data. According to Section 13=
 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that =
we process your personal data in order to fulfil contractual and fiscal obl=
igations and also to send you information regarding our services and events=
. Your personal data are processed with and without electronic means and by=
 respecting data subjects' rights, fundamental freedoms and dignity, partic=
ularly with regard to confidentiality, personal identity and the right to p=
ersonal data protection. At any time and without formalities you can write =
an e-mail to privacy@tis.bz.it in order to object the processing of your pe=
rsonal data for the purpose of sending advertising materials and also to ex=
ercise the right to access personal data and other rights referred to in Se=
ction 7 of Decree 196/2003. The data controller is TIS Techno Innovation Al=
to Adige, Siemens Street n. 19, Bolzano. You can find the complete informat=
ion on the web site www.tis.bz.it.
>
>
>
>
>
> * ** ** ** ** ** * _ c c_ l l_ a a_ u u_ d d_ i i_ o o_ =C2=A0_ m m_ a a_=
 r r_ t t_ e e_ l l_ l l_ a a * ** ** ** ** ** *
> * ** ** ** ** * _ [ [_ r r_ s s_ s s_ =C2=A0_ f f_ e e_ e e_ d d_ ] ] _ a=
 a_ r r_ c c_ h h_ i i_ v v_ e e _ a a_ b b_ o o_ u u_ t t * ** ** ** ** *
> * ** ** ** ** ** * _ G G_ o o_ o o_ g g_ l l_ e e_ =C2=A0_ P P_ r r_ e e_=
 g g_ e e_ l l_ =C2=A0_ i i_ s s_ =C2=A0_ o o_ u u_ t t_ . ._ =C2=A0_ B B_ =
u u_ t t_ =C2=A0_ w w_ h h_ a a_ t t_ =C2=A0_ i i_ s s_ =C2=A0_ P P_ r r_ e=
 e_ g g_ e e_ l l_ ? ? * ** ** ** ** ** *
> June 21, 2010
> Google Pregel's _ p_ a_ p_ e_ r is finally out. But what is Pregel? It ha=
s been mentioned
> in many posts talking about NoSQL, GrabhDBs, Big Data, even the Facebook
> OpenGraph, so it looks like, apart of the hype fuzz, there's a little con=
fusion
> about what it is, what it does, what it's good for and what it certainly =
is
> not. Let's start with the latter.
> Google Pregel is not a database (neither RDBMS nor NoSQL), no key-value s=
tore
> or any new means of storing data (big or small it might be). Putting it i=
n the
> same lists with GraphDBs like _ N_ e_ o_ 4_ j, _ H_ y_ p_ e_ r_ G_ r_ a_ =
p_ h_ D_ B or even Twitter's _ F_ l_ o_ c_ k_ D_ B is
> somehow like putting MapReduce in the NoSQL group.
> GraphDBs are storage systems that use graph representations for data wher=
e each
> node represents an entity with unique ids, type and properties. An arc
> represents a relationship between two nodes and itself can have a type an=
d
> properties. Think of a GraphDB as a RDBMS where instead of tables you hav=
e a
> graph. It's Semantic Web's triple stores brought to general purpose.
> Why would you use a GraphDB? Well, as you can describe your data in terms=
 of
> entities and relationship, you're able to avoid defining a schema as we k=
now
> it. It's a smaller step towards schema-less representation of data that a=
ctual
> NoSQLs provide.
> Informally, you can describe your data with ER diagrams without translati=
ng
> them into tables, keeping the representation dynamic and avoiding the cos=
ts of
> schema redefinition in RDBMs. Plus, it's very efficient and easy to write
> queries that allow you to get, for example, all the followers of a user, =
all
> the users he follows, the items related to him (tweets he wrote?) and may=
be the
> users connected to his items (users retweeting or reply his tweets?) in o=
ne go.
> That's basically what Twitter wants to achieve with FlockDB, but with a g=
eneral
> GraphDB you can describe your data and the relationships between your dat=
a as
> you wish.
> You store data in a GraphDB and you recall it in an easy and efficient wa=
y. So
> what's Pregel good for? What if if you want to mine the data in the graph=
 (i.e.
> Google's Pagerank, Facebook's social network analysis, Twitter's retweeti=
ng/
> authority analysis)? Google reports that 80% of their distributed computa=
tion
> is based on MapReduce (Google Maps, Search Indexing, clustering in Google=
 News,
> reports of Google Trends, Google Translate etc.) so we can only guess tha=
t the
> rest 20% is based on Pregel and the authors report they can work with gra=
phs of
> the size of billions of vertices. Plus, implementing Pagerank is just abo=
ut 15
> lines of code...
> That's what Pregel is for. So what is it? Pregel is a system for large-sc=
ale
> graph processing. It provides a fault-tolerant framework for the executio=
n of
> graph algorithms in parallel over many machines. Think of it as MapReduce=
 re-
> thought for graph operations.
> But what's wrong with MapReduce and graph algorithms? Nothing particularl=
y,
> though it can lead to suboptimal performance because the graph state has =
to be
> passed from one phase to the other generating a lot of I/O, but in genera=
l we
> can say it has some usability issues as it doesn't provide a way to do an=
y per-
> vertex calculation. In general, it's not easy to express graph algorithms=
 in M/
> R. Pregel fills a gap as there are no frameworks for graph processing tha=
t
> address both distributability and fault-tolerance.
> Pregel's architecture is inspired by the Bulk Synchronous Parallel model
> introduced by Valiant. BSP is a computational model for the execution of
> parallel algorithms on top of multiple sequential Von Neumann machines. I=
t
> gives an abstraction, just like M/R, that allows the programmer to think =
about
> the parallel expression of his solution without the hassle of communicati=
on and
> memory allocation in a distributed system. Before we get into details I t=
hink
> two things have to be underlined.
> First, again like M/R, although the model is used by Google to distribute
> computation among multiple computers that's not necessary, in principle B=
SP
> fits parallel programming on SMP or NUMA machines and mainframes.
> Second, although the model is used by Google to distribute graph processi=
ng,
> BSP can be used to distribute other kind of algorithms like matrix
> manipulation, just like M/R.
> Ok, how does BSP work? I'll take the diagram and snippet from the _ B_ S_=
 P_ =C2=A0_ p_ a_ g_ e_ =C2=A0_ f_ r_ o_ m
> _ W_ i_ k_ i_ p_ e_ d_ i_ a:
> =E2=80=9A=C3=84=C3=BAA BSP computer consists of processors connected by a=
 communication network.
> Each processor has a fast local memory, and may follow different threads =
of
> computation.
> A BSP computation proceeds in a series of global supersteps. A superstep
> consists of three ordered stages:
> =C2=A0 1. Concurrent computation: Several computations take place on ever=
y
> =C2=A0 =C2=A0 =C2=A0participating processor. Each process only uses value=
s stored in the
> =C2=A0 =C2=A0 =C2=A0local memory of the processor. The computations are i=
ndependent in the
> =C2=A0 =C2=A0 =C2=A0sense that they occur asynchronously of all the other=
s.
> =C2=A0 2. Communication: At this stage, the processes exchange data betwe=
en
> =C2=A0 =C2=A0 =C2=A0themselves.
> =C2=A0 3. Barrier synchronisation: When a process reaches this point (the=
 barrier),
> =C2=A0 =C2=A0 =C2=A0it waits until all other processes have finished thei=
r communication
> =C2=A0 =C2=A0 =C2=A0actions.
> The figure below shows this in a diagrammatic form. The processes are not
> regarded as having a particular linear order (from left to right or other=
wise),
> and may be mapped to processors in any way.=E2=80=9A=C3=84=C3=B9
> [bsp architecture]
> Ok, basically at every superstep every processor executes the same algori=
thm on
> its data: its state and the incoming messages. At superstep t every proce=
ssor
> will work on its state, which is the result of its computation at superst=
ep t-
> 1, and the messages sent to him at superstep t-1. As a result of the supe=
rstep
> t computation the processor will send messages to other processors and th=
ese
> messages will be the incoming messages at superstep t+1. And the cycle go=
es on.
> The barrier synchronisation is the moment where t gets to be t+1.
> It is easy to see that each computation should take approximately the sam=
e
> amount of time, otherwise a long lasting computation will force the other=
s to
> wait idle.
> How does Pregel implement BSP? Quoting Pregel's original paper: =E2=80=9A=
=C3=84=C3=BAThe input to
> a Pregel computation is a directed graph in which each vertex is uniquely
> identi=C3=94=C2=A8=C3=85ed by a string vertex identi=C3=94=C2=A8=C3=85er.=
 Each vertex is associated with a
> modi=C3=94=C2=A8=C3=85able, user de=C3=94=C2=A8=C3=85ned value. The direc=
ted edges are associated with their
> source vertices, and each edge consists of a modi=C3=94=C2=A8=C3=85able, =
user de=C3=94=C2=A8=C3=85ned value
> and a target vertex identi=C3=94=C2=A8=C3=85er. A typical Pregel computat=
ion consists of
> input, when the graph is initialized, followed by a sequence of superstep=
s
> separated by global synchronization points until the algorithm terminates=
, and
> =C3=94=C2=A8=C3=85nishing with output.
> Within each superstep the vertices compute in parallel, each executing th=
e same
> user-de=C3=94=C2=A8=C3=85ned function that expresses the logic of a given=
 algorithm. A vertex
> can modify its state or that of its outgoing edges, receive messages sent=
 to it
> in the previous superstep, send messages to other vertices (to be receive=
d in
> the next superstep), or even mutate the topology of the graph. Edges are =
not
> =C3=94=C2=A8=C3=85rst-class citizens in this model, having no associated =
computation.
> Algorithm termination is based on every vertex voting to halt. In superst=
ep 0,
> every vertex is in the active state; all active vertices participate in t=
he
> computation of any given superstep. A vertex deactivates itself by voting=
 to
> halt. This means that the vertex has no further work to do unless trigger=
ed
> externally, and the Pregel framework will not execute that vertex in subs=
equent
> supersteps unless it receives a message. If reactivated by a message, a v=
ertex
> must explicitly deactivate itself again. The algorithm as a whole termina=
tes
> when all vertices are simultaneously inactive and there are no messages i=
n
> transit.=E2=80=9A=C3=84=C3=B9
> The mapping between BSP and Pregel is very simple: each local computation=
 of
> BSP maps to the user-defined function in Pregel and the communication oft=
en,
> but not necessarily, corresponds to edge connectivity between nodes. The
> barrier is defined by the halt voting of all the active nodes.
> From the perspective of the API Pregel requires the implementation of the
> virtual Compute() method of the class Vertex. The class Vertex itself pro=
vides
> VoteToHalt(), SendMessageTo(), GetValue(), GetOutEdgeIterator() and const
> methods superstep() and vertex_id().
> Like M/R, it provides the possibility to define Combiners in order to red=
uce
> message passing overhead by combining messages together where semanticall=
y
> possible. Like Sawzall Pregel provides Aggregators which allow global
> communication by receiving messages from multiple vertices, combining the=
m and
> sending the result back to the vertices. They are useful for statistics (=
think
> of an histogram of vertex degrees) or for global controlling (for example=
 an
> aggregator can collect all the vertices' PageRank deltas to calculate the
> convergence condition).
> From the perspective of architecture, Pregel follows a master/worker
> architecture, like most of the other Google frameworks. The master is
> responsible of partitioning the graph with a hash function based on the v=
ertex
> ID (like hash(ID) mod #partitions although a topology-aware partitioner m=
ight
> be able to minimize communication between workers by keeping messages int=
ra-
> machine) but doesn't compute any partition. At the beginning of computati=
on the
> workers subscribe to the computation to the master.
> Once the graph is partitioned and the partitions are assigned to workers,=
 the
> master issues the start of the superstep. Each worker loops through all h=
is
> active vertices, calling the Compute() method and delivering the messages
> collected in the previous superstep. The new messages are delivered befor=
e the
> end of the superstep, right before telling the master the list of active
> vertices for the next superstep.
> After the computation halts the master might ask the workers to dump thei=
r
> graph partition to disk.
> At the moment there are no projects that handle such computational power =
over
> graphs. _ P_ a_ r_ a_ l_ l_ e_ l_ =C2=A0_ B_ G_ L and _ C_ G_ M_ L_ i_ b =
can handle parallel processing on graphs but
> don't scale to this size and is not fault-tolerant. _ H_ a_ m_ a, an Apac=
he incubated
> project, aims at developing a similar model to Pregel, but it's not compl=
ete
> yet.
> So, where do I go now?
> _ G_ o_ o_ g_ l_ e_ '_ s_ =C2=A0_ R_ e_ s_ e_ a_ r_ c_ h_ =C2=A0_ B_ l_ o=
_ g_ =C2=A0_ a_ n_ n_ o_ u_ n_ c_ e_ m_ e_ n_ t
> _ B_ S_ P_ =C2=A0_ W_ o_ r_ l_ d_ -_ w_ i_ d_ e
> _ N_ o_ S_ Q_ L_ =C2=A0_ G_ r_ a_ p_ h_ D_ B
> =C2=A0Please enable JavaScript to view the _ c_ o_ m_ m_ e_ n_ t_ s_ =C2=
=A0_ p_ o_ w_ e_ r_ e_ d_ =C2=A0_ b_ y_ =C2=A0_ D_ i_ s_ q_ u_ s_ . _ b_ l_=
 o_ g_ =C2=A0_ c_ o_ m_ m_ e_ n_ t_ s
> _ p_ o_ w_ e_ r_ e_ d_ =C2=A0_ b_ y_ =C2=A0_ D_ i_ s_ q_ u_ s
>
>


--=20
Best Regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org