spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <>
Subject Re: GraphX: New graph operator
Date Wed, 03 Jun 2015 06:58:33 GMT
Hi Tarek,

I took a quick look at the materials you shared. It actually seems to me
it'd be super easy to express a graph as two DataFrames: one for edges
(srcid, dstid, and other edge attributes) and one for vertices (vid, and
other vertex attributes).


intersection is just


"join" is just


On Tue, Jun 2, 2015 at 12:12 AM, Tarek Auel <> wrote:

> Okay thanks for your feedback.
> What is the expected behavior of union? Like Union and/or union all of
> SQL? Union all would be more or less trivial if we just concatenate the
> vertices and edges (vertex Id conflicts have to be resolved). Should union
> look for duplicates on the actual attribute (VD) or just the vertex Id? If
> it compares the attribute it might be necessary to change the id of some
> vertices in order to resolve conflicts.
> Already a big thanks for your inputs !
> On Mon 1 Jun 2015 at 11:55 pm Ankur Dave <> wrote:
>> I think it would be good to have more basic operators like union or
>> difference, as long as they have an efficient distributed implementation
>> and are plausibly useful.
>> If they can be written in terms of the existing GraphX API, it would be
>> best to put them into GraphOps to keep the core GraphX implementation
>> small. The `mask` operation should actually be in GraphOps -- it's only in
>> GraphImpl for historical reasons. On the other hand, `subgraph` needs to be
>> in GraphImpl for performance: it accesses EdgeRDDImpl#filter(epred, vpred),
>> which can't be a public EdgeRDD method because its semantics rely on an
>> implementation detail (vertex replication).
>> Ankur <>
>> On Mon, Jun 1, 2015 at 8:54 AM, Tarek Auel <> wrote:
>>> Hello,
>>> Someone proposed in a Jira issue to implement new graph operations. Sean
>>> Owen recommended to check first with the mailing list, if this is
>>> interesting or not.
>>> So I would like to know, if it is interesting for GraphX to implement
>>> the operators like:
>>> and/or
>>> If yes, should they be integrated into GraphImpl (like mask, subgraph
>>> etc.) or as external library? My feeling is that they are similar to mask.
>>> Because of consistency they should be part of the graph implementation
>>> itself.
>>> What do you guys think? I really would like to bring GraphX forward and
>>> help to implement some of these.
>>> Looking forward to hear your opinions
>>> Tarek

View raw message