Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B927810E4E for ; Fri, 6 Sep 2013 09:08:11 +0000 (UTC) Received: (qmail 77302 invoked by uid 500); 6 Sep 2013 09:08:09 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 77244 invoked by uid 500); 6 Sep 2013 09:08:05 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 77235 invoked by uid 99); 6 Sep 2013 09:08:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Sep 2013 09:08:02 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mirko.kaempf@cloudera.com designates 209.85.216.177 as permitted sender) Received: from [209.85.216.177] (HELO mail-qc0-f177.google.com) (209.85.216.177) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Sep 2013 09:07:56 +0000 Received: by mail-qc0-f177.google.com with SMTP id x12so933325qcv.8 for ; Fri, 06 Sep 2013 02:07:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=xpg8AXjkNVVsDM2n//HV0Yx9h+GdgjQ2ahAR7Kl3/Wc=; b=h+xHZSWAjjLbuw7KzPad3J5PYMSdy1xmdKlwR9r1Q3QV9r27DsnnSxuOeEnjdMUFLI DDTuJrtbn+ytX3cUEEdHPQ5enKgu7IEGDKwAChJj5sG5K/oG+ztdAVaERKBQxTD6QIty Kr4VSZbbhS75QT4jReuJV8E32A2B0AH3DxLB+d5ZZPuyWQ8KEQeGnpkK/DMKaHNHUdy/ M6AUi9Y0SssDd+T86gCW5Q+sSw9u5AumNv+jRJgwLsWx+P7wcdsrXEj2dZEZ0xGcK5t/ Wkxv7L4sIU1qVVjd9PINq5cKvm41kEiy4lAWMimqZuFOcEl/W4re7ll1MYcGLMnVD+4G npIg== X-Gm-Message-State: ALoCoQmHQrhoLzLqPdtG9tJvP9cYdouVKubiCb/3HfhW7/HJkn8T5MNXA4wgki3i5Cxb/R+uXzlO MIME-Version: 1.0 X-Received: by 10.49.101.77 with SMTP id fe13mr1576666qeb.53.1378458454844; Fri, 06 Sep 2013 02:07:34 -0700 (PDT) Received: by 10.49.72.226 with HTTP; Fri, 6 Sep 2013 02:07:34 -0700 (PDT) In-Reply-To: References: <5db36e9be4df48b992ebe3cbd4eb4f26@AMXPR07MB086.eurprd07.prod.outlook.com> Date: Fri, 6 Sep 2013 11:07:34 +0200 Message-ID: Subject: Re: Dynamic Graphs From: =?ISO-8859-1?Q?Mirko_K=E4mpf?= To: user@giraph.apache.org Content-Type: multipart/alternative; boundary=001a11c2db7289fa9004e5b35e4c X-Virus-Checked: Checked by ClamAV on apache.org --001a11c2db7289fa9004e5b35e4c Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Claudio, and Marco thanks for your comments! I also see the problem of latency in this case and I would like to have a "generic method" which than is implemented maybe on two levels. The loose coupled one with workflows (maybe Oozie) which just reload the graph (no injection) and another one with high integration which allows injection of incoming new data from outside or even after an external procedure was triggers. The most obvious way for the injection would be a low latency request to an HBase cluster to update node or edge properties on the fly or even the integration of Impala (later on which is again more fare away or more decoupled). I am not sure, what the real useful approach is, so I think a generic description of the three modes will be the first step of finishing this "possible design" phase. Implementation should start close to the way how Claudio explains his view on injection. Claudio, and Marco, would you be interested in the review of the current manuscript paper for a paper or even in collaborating on it to make it part of Giraph? I am not sure if it fits well, but it could be something like a vision document for next steps / development stages. And it could also be a contribution to upcoming conferences / meetings. Best wishes Mirko On Fri, Sep 6, 2013 at 9:56 AM, Claudio Martella wrote: > Hi Mirko, > > this is in general the kind of approach I was suggesting, but looked at i= n > a broader-perspective. I'd tend to avoid calling other tools such as Hive > or Pig often to compute injections, as Giraph is still a batch-processing > and this could really introduce latency and reduce throughput. I feel tha= t > if the injection of vertices and edges would really require such a > complexity (such a computing them with M/R), then one could just create a > pipeline of jobs. But this is only my superficial analysis/speculation, I > can see your point on integration and your proposal is very interesting. > > > On Sun, Aug 25, 2013 at 8:55 AM, Mirko K=E4mpf wrote: > >> Good morning Gentlemen, >> >> as far as I understand your thread you are talking about the same topic = I >> was thinking and working some time. >> I work on a research project focused on evolution of networks and >> networks dynamics in networks of networks. >> >> My understanding of Marco's question is, that he needs to change node >> properties or even wants to add nodes to the graph while it is processed= , >> right? >> >> With the WorkerContext we could construct a "Connector" to the outside >> world, not just for loading data from HDFS, which requires a preprocessi= ng >> step for the data which has to be loaded also. I think about HBase often= . >> All my nodes and edges live in HBase. From there it is quite easy to loa= d >> new data based on a simple "Scan" or even if the WorkerContext triggers = a >> Hive or Pig script, one can automatically reorganize or extract relevant >> new links / nodes which have to be added to the graph. >> >> Such an approach means, after n super steps of the Giraph layer an >> additional utility-step (triggered via WorkerContext, or any other bette= r >> fitting class form Giraph - not sure jet there to start) is executed. >> Before such a step the state of the graph is persisted to allow fall bac= k >> or resume. The utility-step can be a processing (MR, Mahout) or just a l= oad >> (from HDFS, HBase) operation and it allows a kind of clocked data flow >> directly into a running Giraph application. I think this is a very >> important feature in Complex Systems research, as we have interacting >> layers which change in parallel. In this picture the Giraph steps are th= e >> steps of layer A, lets say something whats going on on top of a network = and >> the utility-step expresses the changes in the underlying structure >> affecting the network it self but based on the data / properties of the >> second subsystem, e.g. the agents operating on top of the network. >> >> I created a tool, which worked like this - but not at scale - and it was >> at a time before Giraph. What do you think, is there a need for such a k= ind >> of extension in the Giraph world? >> >> Have a nice Sunday. >> >> Best wishes >> Mirko >> >> -- >> -- >> Mirko K=E4mpf >> >> *Trainer* @ Cloudera >> >> tel: +49 *176 20 63 51 99* >> skype: *kamir1604* >> mirko@cloudera.com >> >> >> >> On Wed, Aug 21, 2013 at 3:30 PM, Claudio Martella < >> claudio.martella@gmail.com> wrote: >> >>> As I said, the injection of the new vertices/edges would have to be don= e >>> "manually", hence without any support of the infrastructure. I'd sugges= t >>> you implement a WorkerContext class that supports the reading of a spec= ific >>> file with a specific format (under your control) from HDFS, and that is >>> accessed by this particular "special" vertex (e.g. based on the vertex = ID). >>> >>> Does this make sense? >>> >>> >>> On Wed, Aug 21, 2013 at 2:13 PM, Marco Aurelio Barbosa Fagnani Lotz < >>> m.a.b.lotz@stu12.qmul.ac.uk> wrote: >>> >>>> Dear Mr. Martella, >>>> >>>> Once achieved the conditions for updating the vertex data base, what i= t >>>> the best way for the Injector Vertex to call an input reader again? >>>> >>>> I am able to access all the HDFS data, but I guess the vertex would >>>> need to have access to the input splits and also the vertex input form= at >>>> that I designate. Am I correct? Or there is a way that one can just as= k >>>> Zookeeper to create new splits and distribute to the workers from give= n a >>>> path in DFS? >>>> >>>> Best Regards, >>>> Marco Lotz >>>> ------------------------------ >>>> *From:* Claudio Martella >>>> *Sent:* 14 August 2013 15:25 >>>> *To:* user@giraph.apache.org >>>> *Subject:* Re: Dynamic Graphs >>>> >>>> Hi Marco, >>>> >>>> Giraph currently does not support that. One way of doing this would >>>> be by having a specific (pseudo-)vertex to act as the "injector" of th= e new >>>> vertices and edges For example, it would read a file from HDFS and cal= l the >>>> mutable API during the computation, superstep after superstep. >>>> >>>> >>>> On Wed, Aug 14, 2013 at 3:02 PM, Marco Aurelio Barbosa Fagnani Lotz < >>>> m.a.b.lotz@stu12.qmul.ac.uk> wrote: >>>> >>>>> Hello all, >>>>> >>>>> I would like to know if there is any form to use dynamic graphs with >>>>> Giraph. By dynamic one can read graphs that may change while Giraph i= s >>>>> computing/deliberating. The changes are in the input file and are not >>>>> caused by the graph computation itself. >>>>> >>>>> Is there any way to analyse it using Giraph? If not, anyone has any >>>>> idea/suggestion if it is possible to modify the framework in order to >>>>> process it? >>>>> >>>>> Best Regards, >>>>> Marco Lotz >>>>> >>>> >>>> >>>> >>>> -- >>>> Claudio Martella >>>> claudio.martella@gmail.com >>>> >>> >>> >>> >>> -- >>> Claudio Martella >>> claudio.martella@gmail.com >>> >> >> >> >> >> > > > -- > Claudio Martella > claudio.martella@gmail.com > --=20 --=20 Mirko K=E4mpf *Trainer* @ Cloudera tel: +49 *176 20 63 51 99* skype: *kamir1604* mirko@cloudera.com --001a11c2db7289fa9004e5b35e4c Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi Claudio, and Marco

thanks for = your comments!

I also see the problem = of latency in this case and I would like to have a "generic method&quo= t; which than is implemented maybe on two levels. The loose coupled one wit= h workflows (maybe Oozie) which just reload the graph (no injection) and an= other one with high integration which allows injection of incoming new data= from outside or even after an external procedure was triggers. The most ob= vious way for the injection would be a low latency request to an HBase clus= ter to update node or edge properties on the fly or even the integration of= Impala (later on which is again more fare away or more decoupled).
I am not sure, what the real useful approach is, so I think a ge= neric description of the three modes will be the first step of finishing th= is "possible design" phase. Implementation should start close to = the way how Claudio explains his view on injection.

Claudio, and Marco, would you be interested= in the review of the current manuscript paper for a paper or even in colla= borating on it to make it part of Giraph?

I am not sure if it fits well, but it could be something like a vision docu= ment for next steps / development stages.=A0And it could also be a contribu= tion to upcoming conferences / meetings.

Best wishes=A0
Mirko
=A0=A0=A0
<= div class=3D"gmail_extra">

On Fri, Sep 6,= 2013 at 9:56 AM, Claudio Martella <claudio.martella@gmail.com> wrote:
Hi Mirko,

this is in general the kind of approach I was suggesting, but looked at i= n a broader-perspective. I'd tend to avoid calling other tools such as = Hive or Pig often to compute injections, as Giraph is still a batch-process= ing and this could really introduce latency and reduce throughput. I feel t= hat if the injection of vertices and edges would really require such a comp= lexity (such a computing them with M/R), then one could just create a pipel= ine of jobs. But this is only my superficial analysis/speculation, I can se= e your point on integration and your proposal is very interesting.


On Sun, Aug 25, 2013 at 8:55 AM, Mirko K=E4mpf = <mirko.ka= empf@cloudera.com> wrote:
Good morning Gentlemen,
as far as I understand your thread you are talking about th= e same topic I was thinking and working some time.
I work on a research project focused on evolution of networks and netw= orks dynamics in networks of networks.

My understanding of Marco's question is, that he ne= eds to change node properties or even wants to add nodes to the graph while= it is processed, right?

With the WorkerContext we could construct a "Connector" to the ou= tside world, not just for loading data from HDFS, which requires a preproce= ssing step for the data which has to be loaded also. I think about HBase of= ten. All my nodes and edges live in HBase. From there it is quite easy to l= oad new data based on a simple "Scan" or even if the WorkerContex= t triggers a Hive or Pig script, one can automatically reorganize or extrac= t relevant new links / nodes which have to be added to the graph.

Such an approach means, after n super steps of the Gira= ph layer an additional utility-step (triggered via WorkerContext, or any ot= her better fitting class form Giraph - not sure jet there to start) is exec= uted. Before such a step the state of the graph is persisted to allow fall = back or resume. The utility-step can be a processing (MR, Mahout) or just a= load (from HDFS, HBase) operation and it allows a kind of clocked data flo= w directly into a running Giraph application. I think this is a very import= ant feature in Complex Systems research, as we have interacting layers whic= h change in parallel. In this picture the Giraph steps are the steps of lay= er A, lets say something whats going on on top of a network and the utility= -step expresses the changes in the underlying structure affecting the netwo= rk it self but based on the data / properties of the second subsystem, e.g.= the agents operating on top of the network.

I created a tool, which worked like this - but not at s= cale - and it was at a time before Giraph. What do you think, is there a ne= ed for such a kind of extension in the Giraph world?=A0

Have a nice Sunday.

Best wishe= s=A0
Mirko
=A0=A0=A0
--=A0
--=A0
Mirko K=E4mpf

Trainer=A0@= Cloudera

tel: +49=A0176 20 63 51 99
skype:=A0kamir1604



On Wed, Aug 21, 2013 at 3:30 PM, Claudio Martell= a <claudio.martella@gmail.com> wrote:
As I said, the injection of the new vertices/edges would h= ave to be done "manually", hence without any support of the infra= structure. I'd suggest you implement a WorkerContext class that support= s the reading of a specific file with a specific format (under your control= ) from HDFS, and that is accessed by this particular "special" ve= rtex (e.g. based on the vertex ID).

Does this make sense?
=


On Wed, Aug 21, 2013 at 2:13 P= M, Marco Aurelio Barbosa Fagnani Lotz <m.a.b.lotz@stu12.qmul.ac.= uk> wrote:
Dear Mr. Martella,

Once achieved the conditions for updating the vertex data base, what it the= best way for the Injector Vertex to call an input reader again?

I am able to access all the HDFS data, but I guess the vertex would need to= have access to the input splits and also the vertex input format that I de= signate. Am I correct? Or there is a way that one can just ask Zookeeper to= create new splits and distribute to the workers from given a path in DFS?

Best Regards,
Marco Lotz=A0

From: Claudio Martella <claudio.martella@gmail.com&g= t;
Sent: 14 August 2013 15:25
To: user= @giraph.apache.org
Subject: Re: Dynamic Graphs
=A0
Hi Marco,

Giraph currently does not support that. One way of doing this would be= by having a specific (pseudo-)vertex to act as the "injector" of= the new vertices and edges For example, it would read a file from HDFS and= call the mutable API during the computation, superstep after superstep.


On Wed, Aug 14, 2013 at 3:02 PM, Marco Aurelio B= arbosa Fagnani Lotz <m.a.b.lotz@stu12.qmul.ac.uk> wrote:
Hello all,

I would like to know if there is any form to use dynamic graphs with Giraph= . By dynamic one can read graphs that may change while Giraph is computing/= deliberating. The changes are in the input file and are not caused by the g= raph computation itself.

Is there any way to analyse it using Giraph? If not, anyone has any idea/su= ggestion if it is possible to modify the framework in order to process it?<= br>
Best Regards,
Marco Lotz



--
=A0 =A0Claudio Martella
=A0 =A0clau= dio.martella@gmail.com=A0 =A0



<= font color=3D"#888888">--
=A0 =A0Claudio Martella
=A0 =A0claudio.martella@g= mail.com=A0 =A0







--
=A0 =A0Claudio Martella
= =A0 =A0cla= udio.martella@gmail.com=A0 =A0



--
--=A0Mirko K=E4mpf

Trainer=A0@ Cloudera

tel: +49=A0176 20 63 51 99
skype:=A0kamir1604
<= a href=3D"mailto:mirko@cloudera.com" target=3D"_blank">mirko@cloudera.com

--001a11c2db7289fa9004e5b35e4c--