From issues-return-8111-archive-asf-public=cust-asf.ponee.io@systemml.apache.org Thu Mar 29 04:57:03 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 6CA4B180652 for ; Thu, 29 Mar 2018 04:57:03 +0200 (CEST) Received: (qmail 90628 invoked by uid 500); 29 Mar 2018 02:57:02 -0000 Mailing-List: contact issues-help@systemml.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@systemml.apache.org Delivered-To: mailing list issues@systemml.apache.org Received: (qmail 90619 invoked by uid 99); 29 Mar 2018 02:57:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Mar 2018 02:57:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id F3618180318 for ; Thu, 29 Mar 2018 02:57:01 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -110.311 X-Spam-Level: X-Spam-Status: No, score=-110.311 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id CogGyp3Xsopg for ; Thu, 29 Mar 2018 02:57:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 04EAE5F254 for ; Thu, 29 Mar 2018 02:57:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 7E9AAE00E8 for ; Thu, 29 Mar 2018 02:57:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 155AB255E5 for ; Thu, 29 Mar 2018 02:57:00 +0000 (UTC) Date: Thu, 29 Mar 2018 02:57:00 +0000 (UTC) From: "Matthias Boehm (JIRA)" To: issues@systemml.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (SYSTEMML-2197) Multi-threaded broadcast creation MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SYSTEMML-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16418358#comment-16418358 ] Matthias Boehm commented on SYSTEMML-2197: ------------------------------------------ sure, I just updated the description - let me know if you need more details. > Multi-threaded broadcast creation > --------------------------------- > > Key: SYSTEMML-2197 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2197 > Project: SystemML > Issue Type: Task > Reporter: Matthias Boehm > Priority: Major > > All spark instructions that broadcast one of the input operands, rely on a shared primitive {{sec.getBroadcastForVariable(var)}} for creating partitioned broadcasts, which are wrapper objects around potentially many broadcast variables to overcome Spark 2GB limitation for compressed broadcasts. Each individual broadcast blocks the matrix into squared blocks for direct access without unnecessary copy per task. So far this broadcast creation is single-threaded. > This task aims to parallelize the blocking of the given in-memory matrix into squared blocks (https://github.com/apache/systemml/blob/master/src/main/java/org/apache/sysml/runtime/instructions/spark/data/PartitionedBlock.java#L82) as well as the subsequent partition creation and actual broadcasting (https://github.com/apache/systemml/blob/master/src/main/java/org/apache/sysml/runtime/controlprogram/context/SparkExecutionContext.java#L548). > For consistency and in order to avoid excessive over-provisioning, this multi-threading should use the common internal thread pool or parallel java streams, which similarly calls the shared {{ForkJoinPool.commonPool}}. An example is the multi-threaded parallelization of RDDs which similarly blocks a given matrix into its squared blocks (see https://github.com/apache/systemml/blob/master/src/main/java/org/apache/sysml/runtime/controlprogram/context/SparkExecutionContext.java#L679). -- This message was sent by Atlassian JIRA (v7.6.3#76005)