hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edward.y...@samsung.com>
Subject RE: Groomserer BSPPeerChild limit
Date Sun, 28 Jun 2015 23:44:13 GMT
> (0.7.0) because I do not understand YARN yet. It adds extra configurations

Hama classic mode works on both Hadoop 1.x and Hadoop 2.x HDFS. Yarn 
configuration is only needed when you want to submit a BSP job to Yarn cluster 
without Hama cluster. So you don't need to worry about it. :-)

> distributed mode ? and is there any way to manage the server ? I mean right
> now, I have 3 machines with alot of configurations files and log files. It

You can use web UI at http://masterserver_address:40013/bspmaster.jsp

To debug your program, please try like below:

1) Run a BSPMaster and Zookeeper at server1.
% bin/hama bspmaster
% bin/hama zookeeper

2) Run a Groom at server1 and server2.

% bin/hama groom

3) Check whether deamons are running well. Then, run your program using jar 
command at server1.

% bin/hama jar .....

> In hama_[user]_bspmaster_.....log file I get the following exception. But
> this occurs in both cases when I run my job with 3 tasks or with 4 tasks

In fact, you should not see above initZK error log.

--
Best Regards, Edward J. Yoon


-----Original Message-----
From: Behroz Sikander [mailto:behroz89@gmail.com]
Sent: Monday, June 29, 2015 8:18 AM
To: user@hama.apache.org
Subject: Re: Groomserer BSPPeerChild limit

I will try the things that you mentioned. I am not using the latest version
(0.7.0) because I do not understand YARN yet. It adds extra configurations
which makes it more harder for me to understand when things go wrong. Any
suggestions ?

Further, are there any tools that you use for debugging while in
distributed mode ? and is there any way to manage the server ? I mean right
now, I have 3 machines with alot of configurations files and log files. It
takes alot of time. This makes me wonder how people who have 100s of
machines debug and manage the cluster.

Regards,
Behroz

On Mon, Jun 29, 2015 at 12:53 AM, Edward J. Yoon <edward.yoon@samsung.com>
wrote:

> Hi,
>
> It looks like a zookeeper connection problem. Please check whether
> zookeeper
> is running and every tasks can connect to zookeeper.
>
> I would recommend you to stop the firewall during debugging, and please use
> the 0.7.0 latest release.
>
>
> --
> Best Regards, Edward J. Yoon
>
> -----Original Message-----
> From: Behroz Sikander [mailto:behroz89@gmail.com]
> Sent: Monday, June 29, 2015 7:34 AM
> To: user@hama.apache.org
> Subject: Re: Groomserer BSPPeerChild limit
>
> To figure out the issue, I was trying something else and found out another
> wiered issue. Might be a bug of Hama but I am not sure. Both following
> lines give an exception.
>
> System.out.println( peer.getPeerName(0)); //Exception
>
> System.out.println( peer.getNumPeers()); //Exception
>
>
> [time] ERROR bsp.BSPTask: *Error running bsp setup and bsp function.*
>
> [time]java.lang.*RuntimeException: All peer names could not be retrieved!*
>
> at
>
> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.getAllPeerNames(ZooKeeperSyncClientImpl.java:305)
>
> at org.apache.hama.bsp.BSPPeerImpl.initPeerNames(BSPPeerImpl.java:544)
>
> at org.apache.hama.bsp.BSPPeerImpl.getNumPeers(BSPPeerImpl.java:538)
>
> at testHDFS.EVADMMBsp.setup*(EVADMMBsp.java:58)*
>
> at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
>
> at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>
> at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
>
> On Sun, Jun 28, 2015 at 6:45 PM, Behroz Sikander <behroz89@gmail.com>
> wrote:
>
> > I think I have more information on the issue. I did some debugging and
> > found something quite strange.
> >
> > If I open my job with 6 tasks ( 3 tasks will run on MACHINE1 and 3 task
> > will be opened on other MACHINE2),
> >
> >  -  3 tasks on Machine1 are frozen and the strange thing is that the
> > processes do not even enter the SETUP function of BSP class. I have print
> > statements in the setup function of BSP class and it doesn't print
> > anything. I get empty files with zero size.
> >
> > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:29 .
> > drwxrwxr-x 99 behroz behroz 4096 Jun 28 16:28 ..
> > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> > attempt_201506281624_0001_000000_0.err
> > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> > attempt_201506281624_0001_000000_0.log
> > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> > attempt_201506281624_0001_000001_0.err
> > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> > attempt_201506281624_0001_000001_0.log
> > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> > attempt_201506281624_0001_000002_0.err
> > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> > attempt_201506281624_0001_000002_0.log
> >
> > - On MACHINE2, the code enters the SETUP function of BSP class and prints
> > stuff. See the size of files generated on output. How is it possible that
> > in 3 tasks the code can enter BSP and in others it cannot ?
> >
> > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:39 .
> > drwxrwxr-x 82 behroz behroz 4096 Jun 28 16:39 ..
> > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> > attempt_201506281639_0001_000003_0.err
> > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
> > attempt_201506281639_0001_000003_0.log
> > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> > attempt_201506281639_0001_000004_0.err
> > -rw-rw-r--  1 behroz behroz 1368 Jun 28 16:39
> > attempt_201506281639_0001_000004_0.log
> > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> > attempt_201506281639_0001_000005_0.err
> > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
> > attempt_201506281639_0001_000005_0.log
> >
> > - Hama Groom log file on MACHINE2 (which is frozen) shows.
> > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > 'attempt_201506281639_0001_000001_0' has started.
> > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > 'attempt_201506281639_0001_000002_0' has started.
> > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > 'attempt_201506281639_0001_000000_0' has started.
> >
> > - Hama Groom log file on MACHINE2 shows
> > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > 'attempt_201506281639_0001_000003_0' has started.
> > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > 'attempt_201506281639_0001_000004_0' has started.
> > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > 'attempt_201506281639_0001_000005_0' has started.
> > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > attempt_201506281639_0001_000004_0 is *done*.
> > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > attempt_201506281639_0001_000003_0 is *done*.
> > [time] INFO org.apache.hama.bsp.GroomServer: Task
> > attempt_201506281639_0001_000005_0 is *done*.
> >
> > Any clue what might be going wrong ?
> >
> > Regards,
> > Behroz
> >
> >
> >
> > On Sat, Jun 27, 2015 at 1:13 PM, Behroz Sikander <behroz89@gmail.com>
> > wrote:
> >
> >> Here is the log file from that folder
> >>
> >> 15/06/27 11:10:34 INFO ipc.Server: Starting Socket Reader #1 for port
> >> 61001
> >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server Responder: starting
> >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server listener on 61001:
> starting
> >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 0 on 61001:
> starting
> >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 1 on 61001:
> starting
> >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 2 on 61001:
> starting
> >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 3 on 61001:
> starting
> >> 15/06/27 11:10:34 INFO message.HamaMessageManagerImpl: BSPPeer
> >> address:b178b33b16cc port:61001
> >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 4 on 61001:
> starting
> >> 15/06/27 11:10:34 INFO sync.ZKSyncClient: Initializing ZK Sync Client
> >> 15/06/27 11:10:34 INFO sync.ZooKeeperSyncClientImpl: Start connecting to
> >> Zookeeper! At b178b33b16cc/172.17.0.7:61001
> >> 15/06/27 11:10:37 INFO ipc.Server: Stopping server on 61001
> >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 0 on 61001:
> exiting
> >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server listener on 61001
> >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 1 on 61001:
> exiting
> >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 2 on 61001:
> exiting
> >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server Responder
> >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 3 on 61001:
> exiting
> >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 4 on 61001:
> exiting
> >>
> >>
> >> And my console shows the following ouptut. Hama is frozen right now.
> >> 15/06/27 11:10:32 INFO bsp.BSPJobClient: Running job:
> >> job_201506262331_0003
> >> 15/06/27 11:10:35 INFO bsp.BSPJobClient: Current supersteps number: 0
> >> 15/06/27 11:10:38 INFO bsp.BSPJobClient: Current supersteps number: 2
> >>
> >> On Sat, Jun 27, 2015 at 1:07 PM, Edward J. Yoon <edwardyoon@apache.org>
> >> wrote:
> >>
> >>> Please check the task logs in $HAMA_HOME/logs/tasklogs folder.
> >>>
> >>> On Sat, Jun 27, 2015 at 8:03 PM, Behroz Sikander <behroz89@gmail.com>
> >>> wrote:
> >>> > Yea. I also thought that. I ran the program through eclipse with 20
> >>> tasks
> >>> > and it works fine.
> >>> >
> >>> > On Sat, Jun 27, 2015 at 1:00 PM, Edward J. Yoon <
> edwardyoon@apache.org
> >>> >
> >>> > wrote:
> >>> >
> >>> >> > When I run the PI example, it uses 9 tasks and runs fine.
When I
> >>> run my
> >>> >> > program with 3 tasks, everything runs fine. But when I increase
> the
> >>> tasks
> >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not understand
> >>> what
> >>> >> can
> >>> >> > go wrong.
> >>> >>
> >>> >> It looks like a program bug. Have you ran your program in local
> mode?
> >>> >>
> >>> >> On Sat, Jun 27, 2015 at 8:03 AM, Behroz Sikander <
> behroz89@gmail.com>
> >>> >> wrote:
> >>> >> > Hi,
> >>> >> > In the current thread, I mentioned 3 issues. Issue 1 and 3
are
> >>> resolved
> >>> >> but
> >>> >> > issue number 2 is still giving me headaches.
> >>> >> >
> >>> >> > My problem:
> >>> >> > My cluster now consists of 3 machines. Each one of them properly
> >>> >> configured
> >>> >> > (Apparently). From my master machine when I start Hadoop and
Hama,
> >>> I can
> >>> >> > see the processes started on other 2 machines. If I check
the
> >>> maximum
> >>> >> tasks
> >>> >> > that my cluster can support then I get 9 (3 tasks on each
> machine).
> >>> >> >
> >>> >> > When I run the PI example, it uses 9 tasks and runs fine.
When I
> >>> run my
> >>> >> > program with 3 tasks, everything runs fine. But when I increase
> the
> >>> tasks
> >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not understand
> >>> what
> >>> >> can
> >>> >> > go wrong.
> >>> >> >
> >>> >> > I checked the logs files and things look fine. I just sometimes
> get
> >>> an
> >>> >> > exception that hama was not able to delete the sytem directory
> >>> >> > (bsp.system.dir) defined in the hama-site.xml.
> >>> >> >
> >>> >> > Any help or clue would be great.
> >>> >> >
> >>> >> > Regards,
> >>> >> > Behroz Sikander
> >>> >> >
> >>> >> > On Thu, Jun 25, 2015 at 1:13 PM, Behroz Sikander <
> >>> behroz89@gmail.com>
> >>> >> wrote:
> >>> >> >
> >>> >> >> Thank you :)
> >>> >> >>
> >>> >> >> On Thu, Jun 25, 2015 at 12:14 AM, Edward J. Yoon <
> >>> edwardyoon@apache.org
> >>> >> >
> >>> >> >> wrote:
> >>> >> >>
> >>> >> >>> Hi,
> >>> >> >>>
> >>> >> >>> You can get the maximum number of available tasks
like following
> >>> code:
> >>> >> >>>
> >>> >> >>>     BSPJobClient jobClient = new BSPJobClient(conf);
> >>> >> >>>     ClusterStatus cluster = jobClient.getClusterStatus(true);
> >>> >> >>>
> >>> >> >>>     // Set to maximum
> >>> >> >>>     bsp.setNumBspTask(cluster.getMaxTasks());
> >>> >> >>>
> >>> >> >>>
> >>> >> >>> On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander
<
> >>> behroz89@gmail.com>
> >>> >> >>> wrote:
> >>> >> >>> > Hi,
> >>> >> >>> > 1) Thank you for this.
> >>> >> >>> > 2) Here are the images. I will look into the
log files of PI
> >>> example
> >>> >> >>> >
> >>> >> >>> > *Result of JPS command on slave*
> >>> >> >>> >
> >>> >> >>>
> >>> >>
> >>>
> http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png
> >>> >> >>> >
> >>> >> >>> > *Result of JPS command on Master*
> >>> >> >>> >
> >>> >> >>>
> >>> >>
> >>>
> http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png
> >>> >> >>> >
> >>> >> >>> > 3) In my current case, I do not have any input
submitted to
> the
> >>> job.
> >>> >> >>> During
> >>> >> >>> > run time, I directly fetch data from HDFS. So,
I am looking
> for
> >>> >> >>> something
> >>> >> >>> > like BSPJob.set*Max*NumBspTask().
> >>> >> >>> >
> >>> >> >>> > Regards,
> >>> >> >>> > Behroz
> >>> >> >>> >
> >>> >> >>> >
> >>> >> >>> >
> >>> >> >>> > On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon
<
> >>> >> edwardyoon@apache.org
> >>> >> >>> >
> >>> >> >>> > wrote:
> >>> >> >>> >
> >>> >> >>> >> Hello,
> >>> >> >>> >>
> >>> >> >>> >> 1) You can get the filesystem URI from a
configuration using
> >>> >> >>> >> "FileSystem fs = FileSystem.get(conf);".
Of course, the
> >>> fs.defaultFS
> >>> >> >>> >> property should be in hama-site.xml
> >>> >> >>> >>
> >>> >> >>> >>   <property>
> >>> >> >>> >>     <name>fs.defaultFS</name>
> >>> >> >>> >>     <value>hdfs://host1.mydomain.com:9000/</value>
> >>> >> >>> >>     <description>
> >>> >> >>> >>       The name of the default file system.
Either the literal
> >>> string
> >>> >> >>> >>       "local" or a host:port for HDFS.
> >>> >> >>> >>     </description>
> >>> >> >>> >>   </property>
> >>> >> >>> >>
> >>> >> >>> >> 2) The 'bsp.tasks.maximum' is the number
of tasks per node.
> It
> >>> looks
> >>> >> >>> >> cluster configuration issue. Please run Pi
example and look
> at
> >>> the
> >>> >> >>> >> logs for more details. NOTE: you can not
attach the images to
> >>> >> mailing
> >>> >> >>> >> list so I can't see it.
> >>> >> >>> >>
> >>> >> >>> >> 3) You can use the BSPJob.setNumBspTask(int)
method. If input
> >>> is
> >>> >> >>> >> provided, the number of BSP tasks is basically
driven by the
> >>> number
> >>> >> of
> >>> >> >>> >> DFS blocks. I'll fix it to be more flexible
on HAMA-956.
> >>> >> >>> >>
> >>> >> >>> >> Thanks!
> >>> >> >>> >>
> >>> >> >>> >>
> >>> >> >>> >> On Tue, Jun 23, 2015 at 2:33 AM, Behroz Sikander
<
> >>> >> behroz89@gmail.com>
> >>> >> >>> >> wrote:
> >>> >> >>> >> > Hi,
> >>> >> >>> >> > Recently, I moved from a single machine
setup to a 2
> machine
> >>> >> setup.
> >>> >> >>> I was
> >>> >> >>> >> > successfully able to run my job that
uses the HDFS to get
> >>> data. I
> >>> >> >>> have 3
> >>> >> >>> >> > trivial questions
> >>> >> >>> >> >
> >>> >> >>> >> > 1- To access HDFS, I have to manually
give the IP address
> of
> >>> >> server
> >>> >> >>> >> running
> >>> >> >>> >> > HDFS. I thought that Hama will automatically
pick from the
> >>> >> >>> configurations
> >>> >> >>> >> > but it does not. I am probably doing
something wrong. Right
> >>> now my
> >>> >> >>> code
> >>> >> >>> >> work
> >>> >> >>> >> > by using the following.
> >>> >> >>> >> >
> >>> >> >>> >> > FileSystem fs = FileSystem.get(new
> >>> URI("hdfs://server_ip:port/"),
> >>> >> >>> conf);
> >>> >> >>> >> >
> >>> >> >>> >> > 2- On my master server, when I start
hama it automatically
> >>> starts
> >>> >> >>> hama in
> >>> >> >>> >> > the slave machine (all good). Both master
and slave are set
> >>> as
> >>> >> >>> >> groomservers.
> >>> >> >>> >> > This means that I have 2 servers to
run my job which means
> >>> that I
> >>> >> can
> >>> >> >>> >> open
> >>> >> >>> >> > more BSPPeerChild processes. And if
I submit my jar with 3
> >>> bsp
> >>> >> tasks
> >>> >> >>> then
> >>> >> >>> >> > everything works fine. But when I move
to 4 tasks, Hama
> >>> freezes.
> >>> >> >>> Here is
> >>> >> >>> >> the
> >>> >> >>> >> > result of JPS command on slave.
> >>> >> >>> >> >
> >>> >> >>> >> >
> >>> >> >>> >> > Result of JPS command on Master
> >>> >> >>> >> >
> >>> >> >>> >> >
> >>> >> >>> >> >
> >>> >> >>> >> > You can see that it is only opening
tasks on slaves but not
> >>> on
> >>> >> >>> master.
> >>> >> >>> >> >
> >>> >> >>> >> > Note: I tried to change the bsp.tasks.maximum
property in
> >>> >> >>> >> hama-default.xml
> >>> >> >>> >> > to 4 but still same result.
> >>> >> >>> >> >
> >>> >> >>> >> > 3- I want my cluster to open as many
BSPPeerChild processes
> >>> as
> >>> >> >>> possible.
> >>> >> >>> >> Is
> >>> >> >>> >> > there any setting that can I do to achieve
that ? Or hama
> >>> picks up
> >>> >> >>> the
> >>> >> >>> >> > values from hama-default.xml to open
tasks ?
> >>> >> >>> >> >
> >>> >> >>> >> >
> >>> >> >>> >> > Regards,
> >>> >> >>> >> >
> >>> >> >>> >> > Behroz Sikander
> >>> >> >>> >>
> >>> >> >>> >>
> >>> >> >>> >>
> >>> >> >>> >> --
> >>> >> >>> >> Best Regards, Edward J. Yoon
> >>> >> >>> >>
> >>> >> >>>
> >>> >> >>>
> >>> >> >>>
> >>> >> >>> --
> >>> >> >>> Best Regards, Edward J. Yoon
> >>> >> >>>
> >>> >> >>
> >>> >> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Best Regards, Edward J. Yoon
> >>> >>
> >>>
> >>>
> >>>
> >>> --
> >>> Best Regards, Edward J. Yoon
> >>>
> >>
> >>
> >
>
>
>



Mime
View raw message