Mailing-List: contact user-help@giraph.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@giraph.apache.org
Received-SPF: pass (nike.apache.org: domain of matthewcornell@gmail.com
 designates 209.85.216.43 as permitted sender)
MIME-Version: 1.0
Sender: matthewcornell@gmail.com
In-Reply-To: 
 <CAC7_g14+a8Ke597FNZk_=sK2C5xK52ARVV7vBetvTvPD1Zcv6Q@mail.gmail.com>
References: 
 <CAC7_g14+a8Ke597FNZk_=sK2C5xK52ARVV7vBetvTvPD1Zcv6Q@mail.gmail.com>
From: Matthew Cornell <matt@matthewcornell.org>
Date: Tue, 30 Sep 2014 08:24:49 -0400
Message-ID: 
 <CABVPejEA4obUB4tQCQcKnynnDbsLpf151Epk310NNy+2JXMZVw@mail.gmail.com>
Subject: Re: Giraph 1.0 | Computation stuck at map 100% - reduce 0% for my
 algorithm only, at multi-node cluster
To: user <user@giraph.apache.org>
Content-Type: multipart/alternative; boundary=001a11c251de9f2eb20504477ac8

--001a11c251de9f2eb20504477ac8
Content-Type: text/plain; charset=UTF-8

I'm new, but in my meager experience when it stops at map 100% it means
there was an error somewhere. In Giraph I've often found it difficult to
pin down what that error actually was (e.g., out of memory), but the logs
are the first place to look. Just to clarify re: not finding outputs: Are
you going to http://<your_host.com>:50030/jobtracker.jsp and clicking on
the failed job id (e.g., job_201409251209_0029 ->
http://<your_host.com>:50030/jobdetails.jsp?jobid=job_201409251209_0029&refresh=0
)? From there, click the "map" link in the table to see its tasks. (Giraph
runs entirely as a map task, IIUC.) You should see tasks for the master
plus your workers. If you click on one of them (e.g.,
task_201409251209_0029_m_000000 ->
http://<your_host.com>:50030/taskdetails.jsp?tipid=task_201409251209_0029_m_000000
) you should see what machine it ran on plus a link to the Task Logs. Click
on "All" and you should see three sections for stdout, stderr, and syslog,
the latter of which usually contains hints about what went wrong. You
should check all the worker logs.

Hope that helps.


On Tue, Sep 30, 2014 at 2:53 AM, Panagiotis Eustratiadis <
ep.pan.dit@gmail.com> wrote:

> Good morning,
>
> I have been having a problem the past few days which sadly I can't solve.
>
> First of all I set up a Hadoop 0.20.203.0 cluster of two nodes a master
> and a slave. I followed this tutorial for the settings:
> http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
>
> Then I set up Giraph, and I built it properly with maven. When I run the
> SimpleShortestPathVertex with number of workers = 2 it runs properly, and
> gives me results which I can view from any of the two nodes. Also the
> jobtracker at master:50030 and slave:50030 and everything else is working
> as expected.
>
> However, when I try to run my own algorithm it hangs at map 100% reduce 0%
> forever. I looked at SimpleShortestPathVertex for any configurations and it
> has none. And the weird part is: the jobs at the jobtracker have no logs at
> stdout or stderr. The only thing readable is the map task info:
>
> task_201409300940_0001_m_000000 | 100.00% - MASTER_ZOOKEEPER_ONLY | 1
> finished out of 2 on superstep -1
> task_201409300940_0001_m_000001 | 100.00% | startSuperstep: WORKER_ONLY -
> Attempt=0, Superstep=-1
> task_201409300940_0001_m_000002 | 100.00% | startSuperstep: WORKER_ONLY -
> Attempt=0, Superstep=-1
>
> Is there anything I'm overlooking? I have Googled the obvious stack
> overflow solutions for two days now. Has anyone encountered anything
> similar?
>
> Regards,
> Panagiotis Eustratiadis.
>


-- 
Matthew Cornell | matt@matthewcornell.org | 413-626-3621 | 34 Dickinson
Street, Amherst MA 01002 | matthewcornell.org

--001a11c251de9f2eb20504477ac8
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>I&#39;m new, but in my meager experience when it stop=
s at map 100% it means there was an error somewhere. In Giraph I&#39;ve oft=
en found it difficult to pin down what that error actually was (e.g., out o=
f memory), but the logs are the first place to look. Just to clarify re: no=
t finding outputs: Are you going to http://&lt;<a href=3D"http://your_host.=
com">your_host.com</a>&gt;:50030/jobtracker.jsp and clicking on the failed =
job id (e.g., job_201409251209_0029 -&gt; http://&lt;<a href=3D"http://your=
_host.com">your_host.com</a>&gt;:50030/jobdetails.jsp?jobid=3Djob_201409251=
209_0029&amp;refresh=3D0 )? From there, click the &quot;map&quot; link in t=
he table to see its tasks. (Giraph runs entirely as a map task, IIUC.) You =
should see tasks for the master plus your workers. If you click on one of t=
hem (e.g., task_201409251209_0029_m_000000 -&gt; http://&lt;<a href=3D"http=
://your_host.com">your_host.com</a>&gt;:50030/taskdetails.jsp?tipid=3Dtask_=
201409251209_0029_m_000000 ) you should see what machine it ran on plus a l=
ink to the Task Logs. Click on &quot;All&quot; and you should see three sec=
tions for stdout, stderr, and syslog, the latter of which usually contains =
hints about what went wrong. You should check all the worker logs.<br><br><=
/div>Hope that helps.<br><div><br></div></div><div class=3D"gmail_extra"><b=
r><div class=3D"gmail_quote">On Tue, Sep 30, 2014 at 2:53 AM, Panagiotis Eu=
stratiadis <span dir=3D"ltr">&lt;<a href=3D"mailto:ep.pan.dit@gmail.com" ta=
rget=3D"_blank">ep.pan.dit@gmail.com</a>&gt;</span> wrote:<br><blockquote c=
lass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;=
padding-left:1ex"><div dir=3D"ltr"><div><div><div><div><div>Good morning,<b=
r><br></div>I have been having a problem the past few days which sadly I ca=
n&#39;t solve.<br><br></div>First of all I set up a Hadoop 0.20.203.0 clust=
er of two nodes a master and a slave. I followed this tutorial for the sett=
ings: <a href=3D"http://www.michael-noll.com/tutorials/running-hadoop-on-ub=
untu-linux-multi-node-cluster/" target=3D"_blank">http://www.michael-noll.c=
om/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/</a><br><br>=
</div>Then I set up Giraph, and I built it properly with maven. When I run =
the SimpleShortestPathVertex with number of workers =3D 2 it runs properly,=
 and gives me results which I can view from any of the two nodes. Also the =
jobtracker at master:50030 and slave:50030 and everything else is working a=
s expected.<br><br></div>However, when I try to run my own algorithm it han=
gs at map 100% reduce 0% forever. I looked at SimpleShortestPathVertex for =
any configurations and it has none. And the weird part is: the jobs at the =
jobtracker have no logs at stdout or stderr. The only thing readable is the=
 map task info:<br><br>task_201409300940_0001_m_000000 | 100.00% - MASTER_Z=
OOKEEPER_ONLY | 1 finished out of 2 on superstep -1<br>task_201409300940_00=
01_m_000001 | 100.00% | startSuperstep: WORKER_ONLY - Attempt=3D0, Superste=
p=3D-1<br>task_201409300940_0001_m_000002 | 100.00% | startSuperstep: WORKE=
R_ONLY - Attempt=3D0, Superstep=3D-1<br></div><div><br>Is there anything I&=
#39;m overlooking? I have Googled the obvious stack overflow solutions for =
two days now. Has anyone encountered anything similar?<br><br></div><div>Re=
gards,<br>Panagiotis Eustratiadis.<br></div></div>
</blockquote></div><br><br clear=3D"all"><br>-- <br><div dir=3D"ltr">Matthe=
w Cornell | <a href=3D"mailto:matt@matthewcornell.org" target=3D"_blank">ma=
tt@matthewcornell.org</a> | 413-626-3621 | 34 Dickinson Street, Amherst MA =
01002 | <a href=3D"http://matthewcornell.org" target=3D"_blank">matthewcorn=
ell.org</a></div>
</div>

--001a11c251de9f2eb20504477ac8--