flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-2661) Add a Node Splitting Technique to Overcome the Limitations of Skewed Graphs
Date Mon, 14 Sep 2015 16:20:45 GMT

    [ https://issues.apache.org/jira/browse/FLINK-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14743766#comment-14743766
] 

ASF GitHub Bot commented on FLINK-2661:
---------------------------------------

Github user andralungu commented on the pull request:

    https://github.com/apache/flink/pull/1124#issuecomment-140131099
  
    Hi @vasia ,
    
    There was no clear rejection in the discussion, so I thought I'd try my luck :)
    
    This technique could be enabled by using a flag only if we introduce a node split version
of all the library methods. Otherwise, for newly written code, users would still have to pass
a combine function according to their algorithm. They'd also have to pass the threshold that
differentiates high-degree vertices from low-degree vertices. What is more, they still need
to split and aggregate before and after each step.
    
    These are aspects that cannot be guessed and that highly depend on the algorithm. 
    
    Also, what I had in mind regarding this PR was that we would discuss and  try to improve
the code a bit, rather than immediately discarding it :(


> Add a Node Splitting Technique to Overcome the Limitations of Skewed Graphs
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-2661
>                 URL: https://issues.apache.org/jira/browse/FLINK-2661
>             Project: Flink
>          Issue Type: Task
>          Components: Gelly
>    Affects Versions: 0.10
>            Reporter: Andra Lungu
>            Assignee: Andra Lungu
>
> Skewed graphs raise unique challenges to computation models such as Gelly's vertex-centric
or GSA iterations. This is mainly because of the fact that these approaches uniformly process
vertices regardless of their degree distribution. 
> In vertex-centric, for instance, a skewed node will take more time to process its neighbors
compared to the other nodes in the graph. The first will act as a straggler causing the latter
to remain idle until it finishes its computation. 
> This issue can be mitigated by splitting a high-degree node into subnodes and evenly
distributing the edges to the the resulted subvertices. The computation will then be performed
on the split vertex. 
> To this end, we should add a Splitting API on top of Gelly which can help:
> - determine skewed nodes 
> - split them
> - merge them back at the end of the computation, given a user defined combiner.
> To illustrate the usage of these methods, we should add an example as well as a separate
entry in the documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message