Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id DEBDF200CBE for ; Fri, 2 Jun 2017 00:26:08 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id DD651160BDF; Thu, 1 Jun 2017 22:26:08 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 2E5A6160BC4 for ; Fri, 2 Jun 2017 00:26:08 +0200 (CEST) Received: (qmail 6634 invoked by uid 500); 1 Jun 2017 22:26:06 -0000 Mailing-List: contact dev-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@giraph.apache.org Delivered-To: mailing list dev@giraph.apache.org Received: (qmail 5543 invoked by uid 500); 1 Jun 2017 22:26:06 -0000 Delivered-To: apmail-incubator-giraph-dev@incubator.apache.org Received: (qmail 5538 invoked by uid 99); 1 Jun 2017 22:26:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Jun 2017 22:26:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 9E191CED14 for ; Thu, 1 Jun 2017 22:26:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.201 X-Spam-Level: X-Spam-Status: No, score=-99.201 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id s6yGf3Cza1IX for ; Thu, 1 Jun 2017 22:26:05 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id C8DBD5FC6C for ; Thu, 1 Jun 2017 22:26:04 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 57105E0166 for ; Thu, 1 Jun 2017 22:26:04 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 116E221B56 for ; Thu, 1 Jun 2017 22:26:04 +0000 (UTC) Date: Thu, 1 Jun 2017 22:26:04 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: giraph-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (GIRAPH-1148) Connected components - make calculate sizes work with large number of components MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 01 Jun 2017 22:26:09 -0000 [ https://issues.apache.org/jira/browse/GIRAPH-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033817#comment-16033817 ] ASF GitHub Bot commented on GIRAPH-1148: ---------------------------------------- Github user majakabiljo commented on a diff in the pull request: https://github.com/apache/giraph/pull/39#discussion_r119744463 --- Diff: giraph-block-app-8/src/main/java/org/apache/giraph/block_app/library/prepare_graph/UndirectedConnectedComponents.java --- @@ -352,10 +352,15 @@ Block calculateConnectedComponentSizes( Pair componentToReducePair = Pair.of( new LongWritable(), new LongWritable(1)); LongWritable reusableLong = new LongWritable(); - return Pieces.reduceAndBroadcast( - "CalcConnectedComponentSizes", + // This reduce operation is stateless so we can use a single instance + BasicMapReduce reduceOperation = new BasicMapReduce<>( - LongTypeOps.INSTANCE, LongTypeOps.INSTANCE, SumReduce.LONG), + LongTypeOps.INSTANCE, LongTypeOps.INSTANCE, SumReduce.LONG); + return Pieces.reduceAndBroadcastWithArrayOfHandles( + "CalcConnectedComponentSizes", + 3137, /* Just using some large prime number */ --- End diff -- I can't come up with a reason why someone would want to change it. This can start having problems only at trillion components which wouldn't work for many other reasons, for tiny ones this few reducers won't add any overhead, and for larger ones which were currently working this is still improvement since reducers are processed on many machines now. > Connected components - make calculate sizes work with large number of components > -------------------------------------------------------------------------------- > > Key: GIRAPH-1148 > URL: https://issues.apache.org/jira/browse/GIRAPH-1148 > Project: Giraph > Issue Type: Improvement > Reporter: Maja Kabiljo > Assignee: Maja Kabiljo > > Currently if we have a graph with large number of connected components, calculating connected components sizes fails because reducer becomes too large. Use array of handles instead. -- This message was sent by Atlassian JIRA (v6.3.15#6346)