giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claudio Martella <claudio.marte...@gmail.com>
Subject Re: Dynamic Graphs
Date Fri, 06 Sep 2013 07:56:15 GMT
Hi Mirko,

this is in general the kind of approach I was suggesting, but looked at in
a broader-perspective. I'd tend to avoid calling other tools such as Hive
or Pig often to compute injections, as Giraph is still a batch-processing
and this could really introduce latency and reduce throughput. I feel that
if the injection of vertices and edges would really require such a
complexity (such a computing them with M/R), then one could just create a
pipeline of jobs. But this is only my superficial analysis/speculation, I
can see your point on integration and your proposal is very interesting.


On Sun, Aug 25, 2013 at 8:55 AM, Mirko Kämpf <mirko.kaempf@cloudera.com>wrote:

> Good morning Gentlemen,
>
> as far as I understand your thread you are talking about the same topic I
> was thinking and working some time.
> I work on a research project focused on evolution of networks and networks
> dynamics in networks of networks.
>
> My understanding of Marco's question is, that he needs to change node
> properties or even wants to add nodes to the graph while it is processed,
> right?
>
> With the WorkerContext we could construct a "Connector" to the outside
> world, not just for loading data from HDFS, which requires a preprocessing
> step for the data which has to be loaded also. I think about HBase often.
> All my nodes and edges live in HBase. From there it is quite easy to load
> new data based on a simple "Scan" or even if the WorkerContext triggers a
> Hive or Pig script, one can automatically reorganize or extract relevant
> new links / nodes which have to be added to the graph.
>
> Such an approach means, after n super steps of the Giraph layer an
> additional utility-step (triggered via WorkerContext, or any other better
> fitting class form Giraph - not sure jet there to start) is executed.
> Before such a step the state of the graph is persisted to allow fall back
> or resume. The utility-step can be a processing (MR, Mahout) or just a load
> (from HDFS, HBase) operation and it allows a kind of clocked data flow
> directly into a running Giraph application. I think this is a very
> important feature in Complex Systems research, as we have interacting
> layers which change in parallel. In this picture the Giraph steps are the
> steps of layer A, lets say something whats going on on top of a network and
> the utility-step expresses the changes in the underlying structure
> affecting the network it self but based on the data / properties of the
> second subsystem, e.g. the agents operating on top of the network.
>
> I created a tool, which worked like this - but not at scale - and it was
> at a time before Giraph. What do you think, is there a need for such a kind
> of extension in the Giraph world?
>
> Have a nice Sunday.
>
> Best wishes
> Mirko
>
> --
> --
> Mirko Kämpf
>
> *Trainer* @ Cloudera
>
> tel: +49 *176 20 63 51 99*
> skype: *kamir1604*
> mirko@cloudera.com
>
>
>
> On Wed, Aug 21, 2013 at 3:30 PM, Claudio Martella <
> claudio.martella@gmail.com> wrote:
>
>> As I said, the injection of the new vertices/edges would have to be done
>> "manually", hence without any support of the infrastructure. I'd suggest
>> you implement a WorkerContext class that supports the reading of a specific
>> file with a specific format (under your control) from HDFS, and that is
>> accessed by this particular "special" vertex (e.g. based on the vertex ID).
>>
>> Does this make sense?
>>
>>
>> On Wed, Aug 21, 2013 at 2:13 PM, Marco Aurelio Barbosa Fagnani Lotz <
>> m.a.b.lotz@stu12.qmul.ac.uk> wrote:
>>
>>>  Dear Mr. Martella,
>>>
>>> Once achieved the conditions for updating the vertex data base, what it
>>> the best way for the Injector Vertex to call an input reader again?
>>>
>>> I am able to access all the HDFS data, but I guess the vertex would need
>>> to have access to the input splits and also the vertex input format that I
>>> designate. Am I correct? Or there is a way that one can just ask Zookeeper
>>> to create new splits and distribute to the workers from given a path in DFS?
>>>
>>> Best Regards,
>>> Marco Lotz
>>>  ------------------------------
>>> *From:* Claudio Martella <claudio.martella@gmail.com>
>>> *Sent:* 14 August 2013 15:25
>>> *To:* user@giraph.apache.org
>>> *Subject:* Re: Dynamic Graphs
>>>
>>>  Hi Marco,
>>>
>>>  Giraph currently does not support that. One way of doing this would be
>>> by having a specific (pseudo-)vertex to act as the "injector" of the new
>>> vertices and edges For example, it would read a file from HDFS and call the
>>> mutable API during the computation, superstep after superstep.
>>>
>>>
>>> On Wed, Aug 14, 2013 at 3:02 PM, Marco Aurelio Barbosa Fagnani Lotz <
>>> m.a.b.lotz@stu12.qmul.ac.uk> wrote:
>>>
>>>>  Hello all,
>>>>
>>>> I would like to know if there is any form to use dynamic graphs with
>>>> Giraph. By dynamic one can read graphs that may change while Giraph is
>>>> computing/deliberating. The changes are in the input file and are not
>>>> caused by the graph computation itself.
>>>>
>>>> Is there any way to analyse it using Giraph? If not, anyone has any
>>>> idea/suggestion if it is possible to modify the framework in order to
>>>> process it?
>>>>
>>>> Best Regards,
>>>> Marco Lotz
>>>>
>>>
>>>
>>>
>>>  --
>>>    Claudio Martella
>>>    claudio.martella@gmail.com
>>>
>>
>>
>>
>> --
>>    Claudio Martella
>>    claudio.martella@gmail.com
>>
>
>
>
>
>


-- 
   Claudio Martella
   claudio.martella@gmail.com

Mime
View raw message