Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EFFB6174D7 for ; Tue, 30 Sep 2014 12:25:59 +0000 (UTC) Received: (qmail 64395 invoked by uid 500); 30 Sep 2014 12:25:59 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 64345 invoked by uid 500); 30 Sep 2014 12:25:59 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 64335 invoked by uid 99); 30 Sep 2014 12:25:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Sep 2014 12:25:59 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of matthewcornell@gmail.com designates 209.85.216.43 as permitted sender) Received: from [209.85.216.43] (HELO mail-qa0-f43.google.com) (209.85.216.43) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Sep 2014 12:25:34 +0000 Received: by mail-qa0-f43.google.com with SMTP id cm18so315637qab.2 for ; Tue, 30 Sep 2014 05:25:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=r9wH0W2VE/FtaO5tn/vZv6KDDTCWsKrAp6LoGNt19nM=; b=hUslDuIpTC5qTyjAnAizHwce/NvHbuaqAuIf0Qk4I3ZCur7LI6T51NIAK44Sy1iREg HO68PWDtvDHDWPRkNt93uVqY/NgiXILFuMMQRJwg+eYmiAVrWj/z46zbWrwkzp5nozED 4qzODgcmnsfJNaqOoUJeHy/kKfx4LkYxGlXrj6g2pToP6IAIrTXans1U8qNaRn5V3gDv Xnzhhmmi0yQBabk6Zeh+dBro34lPSdd/MAp5kHkp+Dt9f2XNVoe5W9pBIePaIsfNgQaW Eo8vCZVnTOp5HPIVO6n1cwy+LRfJddcK4PTF5nAMFyn9uK6Yged0dq/AxPNbvzO9Tb8c Hx6A== X-Received: by 10.229.62.129 with SMTP id x1mr59083591qch.16.1412079929992; Tue, 30 Sep 2014 05:25:29 -0700 (PDT) MIME-Version: 1.0 Sender: matthewcornell@gmail.com Received: by 10.224.35.10 with HTTP; Tue, 30 Sep 2014 05:24:49 -0700 (PDT) In-Reply-To: References: From: Matthew Cornell Date: Tue, 30 Sep 2014 08:24:49 -0400 X-Google-Sender-Auth: ei543ls24unDkOnlKGHLUf5efqo Message-ID: Subject: Re: Giraph 1.0 | Computation stuck at map 100% - reduce 0% for my algorithm only, at multi-node cluster To: user Content-Type: multipart/alternative; boundary=001a11c251de9f2eb20504477ac8 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c251de9f2eb20504477ac8 Content-Type: text/plain; charset=UTF-8 I'm new, but in my meager experience when it stops at map 100% it means there was an error somewhere. In Giraph I've often found it difficult to pin down what that error actually was (e.g., out of memory), but the logs are the first place to look. Just to clarify re: not finding outputs: Are you going to http://:50030/jobtracker.jsp and clicking on the failed job id (e.g., job_201409251209_0029 -> http://:50030/jobdetails.jsp?jobid=job_201409251209_0029&refresh=0 )? From there, click the "map" link in the table to see its tasks. (Giraph runs entirely as a map task, IIUC.) You should see tasks for the master plus your workers. If you click on one of them (e.g., task_201409251209_0029_m_000000 -> http://:50030/taskdetails.jsp?tipid=task_201409251209_0029_m_000000 ) you should see what machine it ran on plus a link to the Task Logs. Click on "All" and you should see three sections for stdout, stderr, and syslog, the latter of which usually contains hints about what went wrong. You should check all the worker logs. Hope that helps. On Tue, Sep 30, 2014 at 2:53 AM, Panagiotis Eustratiadis < ep.pan.dit@gmail.com> wrote: > Good morning, > > I have been having a problem the past few days which sadly I can't solve. > > First of all I set up a Hadoop 0.20.203.0 cluster of two nodes a master > and a slave. I followed this tutorial for the settings: > http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ > > Then I set up Giraph, and I built it properly with maven. When I run the > SimpleShortestPathVertex with number of workers = 2 it runs properly, and > gives me results which I can view from any of the two nodes. Also the > jobtracker at master:50030 and slave:50030 and everything else is working > as expected. > > However, when I try to run my own algorithm it hangs at map 100% reduce 0% > forever. I looked at SimpleShortestPathVertex for any configurations and it > has none. And the weird part is: the jobs at the jobtracker have no logs at > stdout or stderr. The only thing readable is the map task info: > > task_201409300940_0001_m_000000 | 100.00% - MASTER_ZOOKEEPER_ONLY | 1 > finished out of 2 on superstep -1 > task_201409300940_0001_m_000001 | 100.00% | startSuperstep: WORKER_ONLY - > Attempt=0, Superstep=-1 > task_201409300940_0001_m_000002 | 100.00% | startSuperstep: WORKER_ONLY - > Attempt=0, Superstep=-1 > > Is there anything I'm overlooking? I have Googled the obvious stack > overflow solutions for two days now. Has anyone encountered anything > similar? > > Regards, > Panagiotis Eustratiadis. > -- Matthew Cornell | matt@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA 01002 | matthewcornell.org --001a11c251de9f2eb20504477ac8 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I'm new, but in my meager experience when it stop= s at map 100% it means there was an error somewhere. In Giraph I've oft= en found it difficult to pin down what that error actually was (e.g., out o= f memory), but the logs are the first place to look. Just to clarify re: no= t finding outputs: Are you going to http://<your_host.com>:50030/jobtracker.jsp and clicking on the failed = job id (e.g., job_201409251209_0029 -> http://<your_host.com>:50030/jobdetails.jsp?jobid=3Djob_201409251= 209_0029&refresh=3D0 )? From there, click the "map" link in t= he table to see its tasks. (Giraph runs entirely as a map task, IIUC.) You = should see tasks for the master plus your workers. If you click on one of t= hem (e.g., task_201409251209_0029_m_000000 -> http://<your_host.com>:50030/taskdetails.jsp?tipid=3Dtask_= 201409251209_0029_m_000000 ) you should see what machine it ran on plus a l= ink to the Task Logs. Click on "All" and you should see three sec= tions for stdout, stderr, and syslog, the latter of which usually contains = hints about what went wrong. You should check all the worker logs.

<= /div>Hope that helps.

On Tue, Sep 30, 2014 at 2:53 AM, Panagiotis Eu= stratiadis <ep.pan.dit@gmail.com> wrote:
Good morning,
I have been having a problem the past few days which sadly I ca= n't solve.

First of all I set up a Hadoop 0.20.203.0 clust= er of two nodes a master and a slave. I followed this tutorial for the sett= ings: http://www.michael-noll.c= om/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/

=
Then I set up Giraph, and I built it properly with maven. When I run = the SimpleShortestPathVertex with number of workers =3D 2 it runs properly,= and gives me results which I can view from any of the two nodes. Also the = jobtracker at master:50030 and slave:50030 and everything else is working a= s expected.

However, when I try to run my own algorithm it han= gs at map 100% reduce 0% forever. I looked at SimpleShortestPathVertex for = any configurations and it has none. And the weird part is: the jobs at the = jobtracker have no logs at stdout or stderr. The only thing readable is the= map task info:

task_201409300940_0001_m_000000 | 100.00% - MASTER_Z= OOKEEPER_ONLY | 1 finished out of 2 on superstep -1
task_201409300940_00= 01_m_000001 | 100.00% | startSuperstep: WORKER_ONLY - Attempt=3D0, Superste= p=3D-1
task_201409300940_0001_m_000002 | 100.00% | startSuperstep: WORKE= R_ONLY - Attempt=3D0, Superstep=3D-1

Is there anything I&= #39;m overlooking? I have Googled the obvious stack overflow solutions for = two days now. Has anyone encountered anything similar?

Re= gards,
Panagiotis Eustratiadis.



--
Matthe= w Cornell | ma= tt@matthewcornell.org | 413-626-3621 | 34 Dickinson Street, Amherst MA = 01002 | matthewcorn= ell.org
--001a11c251de9f2eb20504477ac8--