giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (GIRAPH-1148) Connected components - make calculate sizes work with large number of components
Date Thu, 01 Jun 2017 22:26:04 GMT


ASF GitHub Bot commented on GIRAPH-1148:

Github user majakabiljo commented on a diff in the pull request:
    --- Diff: giraph-block-app-8/src/main/java/org/apache/giraph/block_app/library/prepare_graph/
    @@ -352,10 +352,15 @@ Block calculateConnectedComponentSizes(
         Pair<LongWritable, LongWritable> componentToReducePair = Pair.of(
             new LongWritable(), new LongWritable(1));
         LongWritable reusableLong = new LongWritable();
    -    return Pieces.reduceAndBroadcast(
    -        "CalcConnectedComponentSizes",
    +    // This reduce operation is stateless so we can use a single instance
    +    BasicMapReduce<LongWritable, LongWritable, LongWritable> reduceOperation =
             new BasicMapReduce<>(
    -            LongTypeOps.INSTANCE, LongTypeOps.INSTANCE, SumReduce.LONG),
    +            LongTypeOps.INSTANCE, LongTypeOps.INSTANCE, SumReduce.LONG);
    +    return Pieces.reduceAndBroadcastWithArrayOfHandles(
    +        "CalcConnectedComponentSizes",
    +        3137, /* Just using some large prime number */
    --- End diff --
    I can't come up with a reason why someone would want to change it. This can start having
problems only at trillion components which wouldn't work for many other reasons, for tiny
ones this few reducers won't add any overhead, and for larger ones which were currently working
this is still improvement since reducers are processed on many machines now.

> Connected components - make calculate sizes work with large number of components
> --------------------------------------------------------------------------------
>                 Key: GIRAPH-1148
>                 URL:
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Maja Kabiljo
>            Assignee: Maja Kabiljo
> Currently if we have a graph with large number of connected components, calculating connected
components sizes fails because reducer becomes too large. Use array of handles instead.

This message was sent by Atlassian JIRA

View raw message