Return-Path: X-Original-To: apmail-mahout-dev-archive@www.apache.org Delivered-To: apmail-mahout-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 91FB7187B2 for ; Fri, 14 Aug 2015 06:46:46 +0000 (UTC) Received: (qmail 94502 invoked by uid 500); 14 Aug 2015 06:46:46 -0000 Delivered-To: apmail-mahout-dev-archive@mahout.apache.org Received: (qmail 94428 invoked by uid 500); 14 Aug 2015 06:46:45 -0000 Mailing-List: contact dev-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list dev@mahout.apache.org Received: (qmail 94414 invoked by uid 99); 14 Aug 2015 06:46:45 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Aug 2015 06:46:45 +0000 Date: Fri, 14 Aug 2015 06:46:45 +0000 (UTC) From: "Dmitry Yaraev (JIRA)" To: dev@mahout.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (MAHOUT-1767) Unable to run tests on H2O enigne in distributed mode MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAHOUT-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Yaraev updated MAHOUT-1767: ---------------------------------- Description: When one follows the instructions located in [README.md for H2O module|https://github.com/apache/mahout/blob/master/h2o/README.md] and tries to run tests in the distributed mode, tests run only in the local mode. There are three steps in the instruction: # {code} host-1:~/mahout$ ./bin/mahout h2o-node ... .. INFO: Cloud of size 1 formed [/W.X.Y.Z:54321] {code} # {code} host-2:~/mahout$ ./bin/mahout h2o-node ... .. INFO: Cloud of size 2 formed [/A.B.C.D:54322] {code} # {code} host-N:~/mahout/h2o$ mvn test ... .. INFO: Cloud of size 3 formed [/E.F.G.H:54323] ... All tests passed. ... host-N:~/mahout/h2o$ {code} First two steps are for executing worker nodes. The last one is for executing tests. According to the instruction, after launching tests one more worker is started. And it should join to the same cloud which other worker nodes forms. But it does joined them because it has a different cloud name (or _masterURL_ in terms of the code). If you look in the code, you can find the following: {code:title=DistributedH2OSuite.scala} ... mahoutCtx = mahoutH2OContext("mah2out" + System.currentTimeMillis()) ... {code} After we removed the generated suffix from the cloud name, it started to work. was: When one follows the instructions located in [README.md for H2O module|https://github.com/apache/mahout/blob/master/h2o/README.md] and tries to run tests in the distributed mode, tests run only in the local mode. There are three steps in the instruction: # {code} host-1:~/mahout$ ./bin/mahout h2o-node ... .. INFO: Cloud of size 1 formed [/W.X.Y.Z:54321] {code} # {code} host-2:~/mahout$ ./bin/mahout h2o-node ... .. INFO: Cloud of size 2 formed [/A.B.C.D:54322] {code} # {code} host-N:~/mahout/h2o$ mvn test ... .. INFO: Cloud of size 3 formed [/E.F.G.H:54323] ... All tests passed. ... host-N:~/mahout/h2o$ {code} First two steps are for executing worker nodes. The last one is for executing tests. According to the instruction, after launching tests one more worker is started. And it should join to the same cloud which other worker nodes forms. But it does joined them because it has a different cloud name (or _masterURL_ in terms of the code). If you look in the code, you can find the following: {code:title=DistributedH2OSuite.scala} ... mahoutCtx = mahoutH2OContext("mah2out" + System.currentTimeMillis()) ... {code} We tried to remove generated suffix from the cloud name. After that it started to work. > Unable to run tests on H2O enigne in distributed mode > ----------------------------------------------------- > > Key: MAHOUT-1767 > URL: https://issues.apache.org/jira/browse/MAHOUT-1767 > Project: Mahout > Issue Type: Bug > Components: Documentation > Affects Versions: 0.11.0 > Reporter: Dmitry Yaraev > > When one follows the instructions located in [README.md for H2O module|https://github.com/apache/mahout/blob/master/h2o/README.md] and tries to run tests in the distributed mode, tests run only in the local mode. There are three steps in the instruction: > # {code} > host-1:~/mahout$ ./bin/mahout h2o-node > ... > .. INFO: Cloud of size 1 formed [/W.X.Y.Z:54321] > {code} > # {code} > host-2:~/mahout$ ./bin/mahout h2o-node > ... > .. INFO: Cloud of size 2 formed [/A.B.C.D:54322] > {code} > # {code} > host-N:~/mahout/h2o$ mvn test > ... > .. INFO: Cloud of size 3 formed [/E.F.G.H:54323] > ... > All tests passed. > ... > host-N:~/mahout/h2o$ > {code} > First two steps are for executing worker nodes. The last one is for executing tests. According to the instruction, after launching tests one more worker is started. And it should join to the same cloud which other worker nodes forms. But it does joined them because it has a different cloud name (or _masterURL_ in terms of the code). If you look in the code, you can find the following: > {code:title=DistributedH2OSuite.scala} > ... > mahoutCtx = mahoutH2OContext("mah2out" + System.currentTimeMillis()) > ... > {code} > After we removed the generated suffix from the cloud name, it started to work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)