Return-Path: X-Original-To: apmail-singa-dev-archive@minotaur.apache.org Delivered-To: apmail-singa-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2CC9B1877D for ; Tue, 12 Jan 2016 03:47:59 +0000 (UTC) Received: (qmail 16248 invoked by uid 500); 12 Jan 2016 03:47:35 -0000 Delivered-To: apmail-singa-dev-archive@singa.apache.org Received: (qmail 15379 invoked by uid 500); 12 Jan 2016 03:47:34 -0000 Mailing-List: contact dev-help@singa.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@singa.incubator.apache.org Delivered-To: mailing list dev@singa.incubator.apache.org Received: (qmail 15117 invoked by uid 99); 12 Jan 2016 03:47:34 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Jan 2016 03:47:34 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 9E4C5C0D2F for ; Tue, 12 Jan 2016 03:47:33 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.426 X-Spam-Level: X-Spam-Status: No, score=0.426 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.554] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id PZaxk4UOL3HV for ; Tue, 12 Jan 2016 03:47:32 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with SMTP id 75E9B43E60 for ; Tue, 12 Jan 2016 03:47:31 +0000 (UTC) Received: (qmail 2650 invoked by uid 99); 12 Jan 2016 03:40:39 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Jan 2016 03:40:39 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id CA63F2C1F58 for ; Tue, 12 Jan 2016 03:40:39 +0000 (UTC) Date: Tue, 12 Jan 2016 03:40:39 +0000 (UTC) From: "wangwei (JIRA)" To: dev@singa.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (SINGA-132) Optimize training on a single node with GPUs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 wangwei created SINGA-132: ----------------------------- Summary: Optimize training on a single node with GPUs Key: SINGA-132 URL: https://issues.apache.org/jira/browse/SINGA-132 Project: Singa Issue Type: Improvement Reporter: wangwei Assignee: Haibo Chen There are two training situations. 1. a single worker. For this case, there is not need to launch a separate server thread. Because it would lead to communication cost between the worker and server. Instead, we can create an Updater inside the Worker and call it to update the parameters locally inside the Worker. The driver's working flow should be changed for this case, i.e., there is no need to have a stub thread and server thread. The worker should run in the main thread and the program terminates once the worker finishes. 2. multiple worker. For this case, we need both workers and servers. First, we can make zookeeper an optional dependent library, as it is used for Job ID generation and termination condition check. If no Job ID is available, we can always use the default Job ID (0). Since there is only one process, we don't need zookeeper to know the status of workers in other processes. Second, the communication between worker-stub-server should be optimized, e.g., using GPU-Direct. -- This message was sent by Atlassian JIRA (v6.3.4#6332)