From issues-return-80675-archive-asf-public=cust-asf.ponee.io@ignite.apache.org Fri Nov 2 16:25:04 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 1CAE318077A for ; Fri, 2 Nov 2018 16:25:03 +0100 (CET) Received: (qmail 26283 invoked by uid 500); 2 Nov 2018 15:25:03 -0000 Mailing-List: contact issues-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list issues@ignite.apache.org Received: (qmail 26031 invoked by uid 99); 2 Nov 2018 15:25:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Nov 2018 15:25:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id A0679180CF6 for ; Fri, 2 Nov 2018 15:25:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.501 X-Spam-Level: X-Spam-Status: No, score=-109.501 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id rYiXKDfJLa2e for ; Fri, 2 Nov 2018 15:25:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 5505A5F332 for ; Fri, 2 Nov 2018 15:25:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 009B3E0E3E for ; Fri, 2 Nov 2018 15:25:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 79FA827766 for ; Fri, 2 Nov 2018 15:25:00 +0000 (UTC) Date: Fri, 2 Nov 2018 15:25:00 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@ignite.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (IGNITE-10133) ML: Switch to per-node TensorFlow worker strategy MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/IGNITE-10133?page=3Dcom.atlassi= an.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D16= 673261#comment-16673261 ]=20 ASF GitHub Bot commented on IGNITE-10133: ----------------------------------------- GitHub user dmitrievanthony opened a pull request: https://github.com/apache/ignite/pull/5249 IGNITE-10133: Switch to per-node TensorFlow worker strategy. =20 You can merge this pull request into a Git repository by running: $ git pull https://github.com/gridgain/apache-ignite ignite-10133 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/ignite/pull/5249.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5249 =20 ---- commit 13962c2c13d1cf945cac90ce003831d0a4a4fd33 Author: Anton Dmitriev Date: 2018-11-02T15:20:52Z IGNITE-10133: Switch to per-node TensorFlow worker strategy. ---- > ML: Switch to per-node TensorFlow worker strategy > ------------------------------------------------- > > Key: IGNITE-10133 > URL: https://issues.apache.org/jira/browse/IGNITE-10133 > Project: Ignite > Issue Type: Improvement > Components: ml > Affects Versions: 2.8 > Reporter: Anton Dmitriev > Assignee: Anton Dmitriev > Priority: Major > Fix For: 2.8 > > > Currently we start TensorFlow worker process per every cache partition. I= n case node is=C2=A0equipped by GPU and TensorFlow uses this GPU it acquire= s all GPU memory. If two worker processes try to acquire all GPU memory the= y will fail. > To eliminate this problem and allow users=C2=A0utilizing GPU during the t= raining we need to switch to per-node strategy. It means we need to start o= ne TensorFlow worker process per node, not per partition. -- This message was sent by Atlassian JIRA (v7.6.3#76005)