Return-Path: Delivered-To: apmail-incubator-hama-user-archive@minotaur.apache.org Received: (qmail 58128 invoked from network); 5 Jul 2010 08:22:46 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 5 Jul 2010 08:22:46 -0000 Received: (qmail 29827 invoked by uid 500); 5 Jul 2010 08:22:46 -0000 Delivered-To: apmail-incubator-hama-user-archive@incubator.apache.org Received: (qmail 29700 invoked by uid 500); 5 Jul 2010 08:22:44 -0000 Mailing-List: contact hama-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hama-user@incubator.apache.org Delivered-To: mailing list hama-user@incubator.apache.org Received: (qmail 29691 invoked by uid 99); 5 Jul 2010 08:22:43 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Jul 2010 08:22:43 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.9] (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 05 Jul 2010 08:22:38 +0000 Received: (qmail 58045 invoked by uid 99); 5 Jul 2010 08:22:15 -0000 Received: from localhost.apache.org (HELO mail-iw0-f175.google.com) (127.0.0.1) (smtp-auth username edwardyoon, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Jul 2010 08:22:15 +0000 Received: by iwn34 with SMTP id 34so5278184iwn.6 for ; Mon, 05 Jul 2010 01:22:14 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.176.197 with SMTP id bf5mr2260952ibb.175.1278318133866; Mon, 05 Jul 2010 01:22:13 -0700 (PDT) Received: by 10.231.146.199 with HTTP; Mon, 5 Jul 2010 01:22:13 -0700 (PDT) In-Reply-To: <4C2E0EEC.3030205@tis.bz.it> References: <4C2DCF64.1010507@tis.bz.it> <756a97.15822.1299329fcf3.Coremail.zercal@126.com> <4C2E0EEC.3030205@tis.bz.it> Date: Mon, 5 Jul 2010 17:22:13 +0900 Message-ID: Subject: Re: Pregel article From: "Edward J. Yoon" To: hama-user@incubator.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Wow, Thanks for sharing nice comment and blog post!! I'll look at it soon and try to join to this discussion. :) On Sat, Jul 3, 2010 at 1:08 AM, Claudio Martella wrote: > I'll try to make things clear. > > The end of a superstep is obtained by the un-activation of all the > vertices. In pregel the superstep is over when all the vertices call > VoteToHalt() (in hama it's done by the sync() method). This happens at > the end of each computation by each vertex. Each vertex is activated by > the arrival of a message directed to that vertex. This means that each > superstep computation is atomic and it should be considered in the > design of the algorithm. That's a change of paradigm and that's what > pregel author's call the "vertex's perspective programming". > > So no, there's no assumption about all the vertices to be active at the > beginning of each superstep. > > About Felix's argument: yes, fewer supersteps mean less communication > and synchronization overhead. At the same time, having longer supersteps > will mean that it's more probable that certain vertices end their > computation earlier than others, making them idle for a long time > (waiting for the others to finish), and loosing computational time. So > ideally it should be a good balance between long computational > supersteps (decreasing communication overhead) and short computational > supersteps (decreasing idle time). > This is an intrinsical problem of BSP models because of the barrier. On > the contrary DataFlow models don't have barriers and each computation is > more independent, therefore more similar to the model you have in mind. > > Hope this helps. > > I attach the text from my blog post (roughly obtained with html2text) as > requested. > > Cheers, > > Claudio > > > zercal wrote: >> The paper I found about pregel are not very detailed, >> is it "http://portal.acm.org/citation.cfm?id=3D1582716.1582723"? >> I guess, in this paper, vertices are assumed to be all actived at every = superstep. simply random >> access will reduce some communication cost but take more superstep. >> However, is there any way of vertice selection method can be performed >> that at every super step, each vertex knows whether to active according = to >> information it kept and received from other vertex? >> But I can't found more detail from that paper... >> Becides, I can not access your blogs. Would you please send me your arti= cle? >> >> Thank you very much! >> from Xiong Chenyan >> >> =C3=94=C3=9A2010-07-02 20:04:40=C2=A3=C2=AC"Felix Halim" =C3=90=C2=B4=C2=B5=C3=80=C2=A3=C2=BA >> >>> Exactly how to activate a particular vertex is not clear from the >>> paper (is it random access?) and this feature is probably not as good >>> as it sounds for complex graph algorithms. It might be better off to >>> assume all vertices are active (to reduce the overhead of the flag >>> needed and the space to make it randomly accessible, by storing it in >>> blocks or whatever). >>> >>> Here is my argument: >>> >>> The way Pregel (and existing MR) works is iterative, where each >>> iteration is separated by a super-step barrier where all messages have >>> to arrive. Algorithms that have fewer super-steps are preferable than >>> those that have large number of super-steps. In fact, we should >>> measure Algorithms in terms of the number of super-steps required. To >>> minimize the number of super-steps, likely we need to activate as much >>> vertices as possible to do all the work in current super-step, rather >>> than spill-over to the next super-step. In this case, the feature to >>> "turn off" vertices is useless, since most of the time all vertices >>> will be active to effectively reduce the number of super-steps. >>> >>> Unfortunately, I don't have experiments to backup my argument... I >>> don't have Pregel... >>> >>> Felix Halim >>> >>> >>> On Fri, Jul 2, 2010 at 7:37 PM, Claudio Martella >>> wrote: >>> >>>> I did too. See: >>>> >>>> http://blog.acaro.org/entry/pregel-is-out-but-what-is-pregel >>>> >>>> >>>> Felix Halim wrote: >>>> >>>>> I have. See my comment in this blog: >>>>> >>>>> http://blog.udanax.org/2010/06/summary-of-google-pregel.html >>>>> >>>>> Felix Halim >>>>> >>>>> >>>>> On Tue, Jun 8, 2010 at 4:00 AM, Mark Kerzner = wrote: >>>>> >>>>> >>>>>> Hi, >>>>>> >>>>>> anybody has read it? >>>>>> >>>>>> Thank you, >>>>>> Mark >>>>>> >>>>>> >>>>>> >>>>> >>>> -- >>>> Claudio Martella >>>> Digital Technologies >>>> Unit Research & Development - Analyst >>>> >>>> TIS innovation park >>>> Via Siemens 19 | Siemensstr. 19 >>>> 39100 Bolzano | 39100 Bozen >>>> Tel. +39 0471 068 123 >>>> Fax =C2=A0+39 0471 068 129 >>>> claudio.martella@tis.bz.it http://www.tis.bz.it >>>> >>>> Short information regarding use of personal data. According to Section= 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you th= at we process your personal data in order to fulfil contractual and fiscal = obligations and also to send you information regarding our services and eve= nts. Your personal data are processed with and without electronic means and= by respecting data subjects' rights, fundamental freedoms and dignity, par= ticularly with regard to confidentiality, personal identity and the right t= o personal data protection. At any time and without formalities you can wri= te an e-mail to privacy@tis.bz.it in order to object the processing of your= personal data for the purpose of sending advertising materials and also to= exercise the right to access personal data and other rights referred to in= Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation= Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete infor= mation on the web site www.tis.bz.it. >>>> >>>> >>>> >>>> > > > -- > Claudio Martella > Digital Technologies > Unit Research & Development - Analyst > > TIS innovation park > Via Siemens 19 | Siemensstr. 19 > 39100 Bolzano | 39100 Bozen > Tel. +39 0471 068 123 > Fax =C2=A0+39 0471 068 129 > claudio.martella@tis.bz.it http://www.tis.bz.it > > Short information regarding use of personal data. According to Section 13= of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that = we process your personal data in order to fulfil contractual and fiscal obl= igations and also to send you information regarding our services and events= . Your personal data are processed with and without electronic means and by= respecting data subjects' rights, fundamental freedoms and dignity, partic= ularly with regard to confidentiality, personal identity and the right to p= ersonal data protection. At any time and without formalities you can write = an e-mail to privacy@tis.bz.it in order to object the processing of your pe= rsonal data for the purpose of sending advertising materials and also to ex= ercise the right to access personal data and other rights referred to in Se= ction 7 of Decree 196/2003. The data controller is TIS Techno Innovation Al= to Adige, Siemens Street n. 19, Bolzano. You can find the complete informat= ion on the web site www.tis.bz.it. > > > > > > * ** ** ** ** ** * _ c c_ l l_ a a_ u u_ d d_ i i_ o o_ =C2=A0_ m m_ a a_= r r_ t t_ e e_ l l_ l l_ a a * ** ** ** ** ** * > * ** ** ** ** * _ [ [_ r r_ s s_ s s_ =C2=A0_ f f_ e e_ e e_ d d_ ] ] _ a= a_ r r_ c c_ h h_ i i_ v v_ e e _ a a_ b b_ o o_ u u_ t t * ** ** ** ** * > * ** ** ** ** ** * _ G G_ o o_ o o_ g g_ l l_ e e_ =C2=A0_ P P_ r r_ e e_= g g_ e e_ l l_ =C2=A0_ i i_ s s_ =C2=A0_ o o_ u u_ t t_ . ._ =C2=A0_ B B_ = u u_ t t_ =C2=A0_ w w_ h h_ a a_ t t_ =C2=A0_ i i_ s s_ =C2=A0_ P P_ r r_ e= e_ g g_ e e_ l l_ ? ? * ** ** ** ** ** * > June 21, 2010 > Google Pregel's _ p_ a_ p_ e_ r is finally out. But what is Pregel? It ha= s been mentioned > in many posts talking about NoSQL, GrabhDBs, Big Data, even the Facebook > OpenGraph, so it looks like, apart of the hype fuzz, there's a little con= fusion > about what it is, what it does, what it's good for and what it certainly = is > not. Let's start with the latter. > Google Pregel is not a database (neither RDBMS nor NoSQL), no key-value s= tore > or any new means of storing data (big or small it might be). Putting it i= n the > same lists with GraphDBs like _ N_ e_ o_ 4_ j, _ H_ y_ p_ e_ r_ G_ r_ a_ = p_ h_ D_ B or even Twitter's _ F_ l_ o_ c_ k_ D_ B is > somehow like putting MapReduce in the NoSQL group. > GraphDBs are storage systems that use graph representations for data wher= e each > node represents an entity with unique ids, type and properties. An arc > represents a relationship between two nodes and itself can have a type an= d > properties. Think of a GraphDB as a RDBMS where instead of tables you hav= e a > graph. It's Semantic Web's triple stores brought to general purpose. > Why would you use a GraphDB? Well, as you can describe your data in terms= of > entities and relationship, you're able to avoid defining a schema as we k= now > it. It's a smaller step towards schema-less representation of data that a= ctual > NoSQLs provide. > Informally, you can describe your data with ER diagrams without translati= ng > them into tables, keeping the representation dynamic and avoiding the cos= ts of > schema redefinition in RDBMs. Plus, it's very efficient and easy to write > queries that allow you to get, for example, all the followers of a user, = all > the users he follows, the items related to him (tweets he wrote?) and may= be the > users connected to his items (users retweeting or reply his tweets?) in o= ne go. > That's basically what Twitter wants to achieve with FlockDB, but with a g= eneral > GraphDB you can describe your data and the relationships between your dat= a as > you wish. > You store data in a GraphDB and you recall it in an easy and efficient wa= y. So > what's Pregel good for? What if if you want to mine the data in the graph= (i.e. > Google's Pagerank, Facebook's social network analysis, Twitter's retweeti= ng/ > authority analysis)? Google reports that 80% of their distributed computa= tion > is based on MapReduce (Google Maps, Search Indexing, clustering in Google= News, > reports of Google Trends, Google Translate etc.) so we can only guess tha= t the > rest 20% is based on Pregel and the authors report they can work with gra= phs of > the size of billions of vertices. Plus, implementing Pagerank is just abo= ut 15 > lines of code... > That's what Pregel is for. So what is it? Pregel is a system for large-sc= ale > graph processing. It provides a fault-tolerant framework for the executio= n of > graph algorithms in parallel over many machines. Think of it as MapReduce= re- > thought for graph operations. > But what's wrong with MapReduce and graph algorithms? Nothing particularl= y, > though it can lead to suboptimal performance because the graph state has = to be > passed from one phase to the other generating a lot of I/O, but in genera= l we > can say it has some usability issues as it doesn't provide a way to do an= y per- > vertex calculation. In general, it's not easy to express graph algorithms= in M/ > R. Pregel fills a gap as there are no frameworks for graph processing tha= t > address both distributability and fault-tolerance. > Pregel's architecture is inspired by the Bulk Synchronous Parallel model > introduced by Valiant. BSP is a computational model for the execution of > parallel algorithms on top of multiple sequential Von Neumann machines. I= t > gives an abstraction, just like M/R, that allows the programmer to think = about > the parallel expression of his solution without the hassle of communicati= on and > memory allocation in a distributed system. Before we get into details I t= hink > two things have to be underlined. > First, again like M/R, although the model is used by Google to distribute > computation among multiple computers that's not necessary, in principle B= SP > fits parallel programming on SMP or NUMA machines and mainframes. > Second, although the model is used by Google to distribute graph processi= ng, > BSP can be used to distribute other kind of algorithms like matrix > manipulation, just like M/R. > Ok, how does BSP work? I'll take the diagram and snippet from the _ B_ S_= P_ =C2=A0_ p_ a_ g_ e_ =C2=A0_ f_ r_ o_ m > _ W_ i_ k_ i_ p_ e_ d_ i_ a: > =E2=80=9A=C3=84=C3=BAA BSP computer consists of processors connected by a= communication network. > Each processor has a fast local memory, and may follow different threads = of > computation. > A BSP computation proceeds in a series of global supersteps. A superstep > consists of three ordered stages: > =C2=A0 1. Concurrent computation: Several computations take place on ever= y > =C2=A0 =C2=A0 =C2=A0participating processor. Each process only uses value= s stored in the > =C2=A0 =C2=A0 =C2=A0local memory of the processor. The computations are i= ndependent in the > =C2=A0 =C2=A0 =C2=A0sense that they occur asynchronously of all the other= s. > =C2=A0 2. Communication: At this stage, the processes exchange data betwe= en > =C2=A0 =C2=A0 =C2=A0themselves. > =C2=A0 3. Barrier synchronisation: When a process reaches this point (the= barrier), > =C2=A0 =C2=A0 =C2=A0it waits until all other processes have finished thei= r communication > =C2=A0 =C2=A0 =C2=A0actions. > The figure below shows this in a diagrammatic form. The processes are not > regarded as having a particular linear order (from left to right or other= wise), > and may be mapped to processors in any way.=E2=80=9A=C3=84=C3=B9 > [bsp architecture] > Ok, basically at every superstep every processor executes the same algori= thm on > its data: its state and the incoming messages. At superstep t every proce= ssor > will work on its state, which is the result of its computation at superst= ep t- > 1, and the messages sent to him at superstep t-1. As a result of the supe= rstep > t computation the processor will send messages to other processors and th= ese > messages will be the incoming messages at superstep t+1. And the cycle go= es on. > The barrier synchronisation is the moment where t gets to be t+1. > It is easy to see that each computation should take approximately the sam= e > amount of time, otherwise a long lasting computation will force the other= s to > wait idle. > How does Pregel implement BSP? Quoting Pregel's original paper: =E2=80=9A= =C3=84=C3=BAThe input to > a Pregel computation is a directed graph in which each vertex is uniquely > identi=C3=94=C2=A8=C3=85ed by a string vertex identi=C3=94=C2=A8=C3=85er.= Each vertex is associated with a > modi=C3=94=C2=A8=C3=85able, user de=C3=94=C2=A8=C3=85ned value. The direc= ted edges are associated with their > source vertices, and each edge consists of a modi=C3=94=C2=A8=C3=85able, = user de=C3=94=C2=A8=C3=85ned value > and a target vertex identi=C3=94=C2=A8=C3=85er. A typical Pregel computat= ion consists of > input, when the graph is initialized, followed by a sequence of superstep= s > separated by global synchronization points until the algorithm terminates= , and > =C3=94=C2=A8=C3=85nishing with output. > Within each superstep the vertices compute in parallel, each executing th= e same > user-de=C3=94=C2=A8=C3=85ned function that expresses the logic of a given= algorithm. A vertex > can modify its state or that of its outgoing edges, receive messages sent= to it > in the previous superstep, send messages to other vertices (to be receive= d in > the next superstep), or even mutate the topology of the graph. Edges are = not > =C3=94=C2=A8=C3=85rst-class citizens in this model, having no associated = computation. > Algorithm termination is based on every vertex voting to halt. In superst= ep 0, > every vertex is in the active state; all active vertices participate in t= he > computation of any given superstep. A vertex deactivates itself by voting= to > halt. This means that the vertex has no further work to do unless trigger= ed > externally, and the Pregel framework will not execute that vertex in subs= equent > supersteps unless it receives a message. If reactivated by a message, a v= ertex > must explicitly deactivate itself again. The algorithm as a whole termina= tes > when all vertices are simultaneously inactive and there are no messages i= n > transit.=E2=80=9A=C3=84=C3=B9 > The mapping between BSP and Pregel is very simple: each local computation= of > BSP maps to the user-defined function in Pregel and the communication oft= en, > but not necessarily, corresponds to edge connectivity between nodes. The > barrier is defined by the halt voting of all the active nodes. > From the perspective of the API Pregel requires the implementation of the > virtual Compute() method of the class Vertex. The class Vertex itself pro= vides > VoteToHalt(), SendMessageTo(), GetValue(), GetOutEdgeIterator() and const > methods superstep() and vertex_id(). > Like M/R, it provides the possibility to define Combiners in order to red= uce > message passing overhead by combining messages together where semanticall= y > possible. Like Sawzall Pregel provides Aggregators which allow global > communication by receiving messages from multiple vertices, combining the= m and > sending the result back to the vertices. They are useful for statistics (= think > of an histogram of vertex degrees) or for global controlling (for example= an > aggregator can collect all the vertices' PageRank deltas to calculate the > convergence condition). > From the perspective of architecture, Pregel follows a master/worker > architecture, like most of the other Google frameworks. The master is > responsible of partitioning the graph with a hash function based on the v= ertex > ID (like hash(ID) mod #partitions although a topology-aware partitioner m= ight > be able to minimize communication between workers by keeping messages int= ra- > machine) but doesn't compute any partition. At the beginning of computati= on the > workers subscribe to the computation to the master. > Once the graph is partitioned and the partitions are assigned to workers,= the > master issues the start of the superstep. Each worker loops through all h= is > active vertices, calling the Compute() method and delivering the messages > collected in the previous superstep. The new messages are delivered befor= e the > end of the superstep, right before telling the master the list of active > vertices for the next superstep. > After the computation halts the master might ask the workers to dump thei= r > graph partition to disk. > At the moment there are no projects that handle such computational power = over > graphs. _ P_ a_ r_ a_ l_ l_ e_ l_ =C2=A0_ B_ G_ L and _ C_ G_ M_ L_ i_ b = can handle parallel processing on graphs but > don't scale to this size and is not fault-tolerant. _ H_ a_ m_ a, an Apac= he incubated > project, aims at developing a similar model to Pregel, but it's not compl= ete > yet. > So, where do I go now? > _ G_ o_ o_ g_ l_ e_ '_ s_ =C2=A0_ R_ e_ s_ e_ a_ r_ c_ h_ =C2=A0_ B_ l_ o= _ g_ =C2=A0_ a_ n_ n_ o_ u_ n_ c_ e_ m_ e_ n_ t > _ B_ S_ P_ =C2=A0_ W_ o_ r_ l_ d_ -_ w_ i_ d_ e > _ N_ o_ S_ Q_ L_ =C2=A0_ G_ r_ a_ p_ h_ D_ B > =C2=A0Please enable JavaScript to view the _ c_ o_ m_ m_ e_ n_ t_ s_ =C2= =A0_ p_ o_ w_ e_ r_ e_ d_ =C2=A0_ b_ y_ =C2=A0_ D_ i_ s_ q_ u_ s_ . _ b_ l_= o_ g_ =C2=A0_ c_ o_ m_ m_ e_ n_ t_ s > _ p_ o_ w_ e_ r_ e_ d_ =C2=A0_ b_ y_ =C2=A0_ D_ i_ s_ q_ u_ s > > --=20 Best Regards, Edward J. Yoon edwardyoon@apache.org http://blog.udanax.org