hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: Groomserer BSPPeerChild limit
Date Mon, 29 Jun 2015 03:47:26 GMT
OKay almost done. I guess you need to add host names to your
/etc/hosts file. :-) Please see also
http://stackoverflow.com/questions/4730148/unknownhostexception-on-tasktracker-in-hadoop-cluster

On Mon, Jun 29, 2015 at 12:41 PM, Behroz Sikander <behroz89@gmail.com> wrote:
> Server 2 was showing the exception that I posted in the previous email.
> Server1 is showing the following exception
>
> 15/06/29 03:27:42 INFO ipc.Server: IPC Server handler 0 on 40000: starting
> 15/06/29 03:28:53 INFO bsp.BSPMaster: groomd_b178b33b16cc_50000 is added.
> 15/06/29 03:29:20 ERROR bsp.BSPMaster: Fail to register GroomServer
> groomd_8d4b512cf448_50000
> java.net.UnknownHostException: unknown host: 8d4b512cf448
> at org.apache.hama.ipc.Client$Connection.<init>(Client.java:225)
> at org.apache.hama.ipc.Client.getConnection(Client.java:1039)
> at org.apache.hama.ipc.Client.call(Client.java:888)
> at org.apache.hama.ipc.RPC$Invoker.invoke(RPC.java:239)
> at com.sun.proxy.$Proxy11.getProtocolVersion(Unknown Source)
>
> I am looking into this issue.
>
> On Mon, Jun 29, 2015 at 5:31 AM, Behroz Sikander <behroz89@gmail.com> wrote:
>
>> Ok great. I was able to run the zk, groom and bspmaster on server 1. But
>> when I ran the groom on server2 I got the following exception
>>
>> 15/06/29 03:29:20 ERROR bsp.GroomServer: There is a problem in
>> establishing communication link with BSPMaster
>> 15/06/29 03:29:20 ERROR bsp.GroomServer: Got fatal exception while
>> reinitializing GroomServer: java.io.IOException: There is a problem in
>> establishing communication link with BSPMaster.
>> at org.apache.hama.bsp.GroomServer.initialize(GroomServer.java:426)
>> at org.apache.hama.bsp.GroomServer.run(GroomServer.java:860)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> On Mon, Jun 29, 2015 at 5:21 AM, Edward J. Yoon <edwardyoon@apache.org>
>> wrote:
>>
>>> Here's my configurations:
>>>
>>> hama-site.xml:
>>>
>>>   <property>
>>>     <name>bsp.master.address</name>
>>>     <value>cluster-0:40000</value>
>>>   </property>
>>>
>>>   <property>
>>>     <name>fs.default.name</name>
>>>     <value>hdfs://cluster-0:9000/</value>
>>>   </property>
>>>
>>>   <property>
>>>     <name>hama.zookeeper.quorum</name>
>>>     <value>cluster-0</value>
>>>   </property>
>>>
>>>
>>> % bin/hama zookeeper
>>> 15/06/29 12:17:17 ERROR quorum.QuorumPeerConfig: Invalid
>>> configuration, only one server specified (ignoring)
>>>
>>> Then, open new terminal and run master with following command:
>>>
>>> % bin/hama bspmaster
>>> ...
>>> 15/06/29 12:17:40 INFO sync.ZKSyncBSPMasterClient: Initialized ZK false
>>> 15/06/29 12:17:40 INFO sync.ZKSyncClient: Initializing ZK Sync Client
>>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server Responder: starting
>>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server listener on 40000: starting
>>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server handler 0 on 40000: starting
>>> 15/06/29 12:17:40 INFO bsp.BSPMaster: Starting RUNNING
>>>
>>>
>>>
>>> On Mon, Jun 29, 2015 at 12:17 PM, Edward J. Yoon <edwardyoon@apache.org>
>>> wrote:
>>> > Hi,
>>> >
>>> > If you run zk server too, BSPmaster will be connected to zk and won't
>>> > throw exceptions.
>>> >
>>> > On Mon, Jun 29, 2015 at 12:13 PM, Behroz Sikander <behroz89@gmail.com>
>>> wrote:
>>> >> Hi,
>>> >> Thank you the information. I moved to hama 0.7.0 and I still have the
>>> same
>>> >> problem.
>>> >> When I run % bin/hama bspmaster, I am getting the following exception
>>> >>
>>> >> INFO http.HttpServer: Port returned by
>>> >> webServer.getConnectors()[0].getLocalPort() before open() is -1.
>>> Opening
>>> >> the listener on 40013
>>> >>  INFO http.HttpServer: listener.getLocalPort() returned 40013
>>> >> webServer.getConnectors()[0].getLocalPort() returned 40013
>>> >>  INFO http.HttpServer: Jetty bound to port 40013
>>> >>  INFO mortbay.log: jetty-6.1.14
>>> >>  INFO mortbay.log: Extract
>>> >>
>>> jar:file:/home/behroz/Documents/Packages/hama-0.7.0/hama-core-0.7.0.jar!/webapp/bspmaster/
>>> >> to /tmp/Jetty_b178b33b16cc_40013_bspmaster____.cof30w/webapp
>>> >>  INFO mortbay.log: Started SelectChannelConnector@b178b33b16cc:40013
>>> >>  INFO bsp.BSPMaster: Cleaning up the system directory
>>> >>  INFO bsp.BSPMaster: hdfs://
>>> 172.17.0.3:54310/tmp/hama-behroz/bsp/system
>>> >>  INFO sync.ZKSyncBSPMasterClient: Initialized ZK false
>>> >>  INFO sync.ZKSyncClient: Initializing ZK Sync Client
>>> >>  ERROR sync.ZKSyncBSPMasterClient:
>>> >> org.apache.zookeeper.KeeperException$ConnectionLossException:
>>> >> KeeperErrorCode = ConnectionLoss for /bsp
>>> >> at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>>> >> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>>> >> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
>>> >> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
>>> >> at
>>> >>
>>> org.apache.hama.bsp.sync.ZKSyncBSPMasterClient.init(ZKSyncBSPMasterClient.java:62)
>>> >> at org.apache.hama.bsp.BSPMaster.initZK(BSPMaster.java:534)
>>> >> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:517)
>>> >> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:500)
>>> >> at org.apache.hama.BSPMasterRunner.run(BSPMasterRunner.java:46)
>>> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>>> >> at org.apache.hama.BSPMasterRunner.main(BSPMasterRunner.java:56)
>>> >>  ERROR sync.ZKSyncBSPMasterClient:
>>> >> org.apache.zookeeper.KeeperException$ConnectionLossException:
>>> >> KeeperErrorCode = ConnectionLoss for /bsp
>>> >>
>>> >> *Why zookeeper settings in hama-site.xml are (right now, I am using
>>> just
>>> >> two servers 172.17.0.3 and 172.17.0.7)*
>>> >> <property>
>>> >>                  <name>hama.zookeeper.quorum</name>
>>> >>                  <value>172.17.0.3,172.17.0.7</value>
>>> >>                  <description>Comma separated list of servers
in the
>>> >> ZooKeeper quorum.
>>> >>                  For example, "host1.mydomain.com,host2.mydomain.com,
>>> >> host3.mydomain.com".
>>> >>                  By default this is set to localhost for local and
>>> >> pseudo-distributed modes
>>> >>                  of operation. For a fully-distributed setup, this
>>> should
>>> >> be set to a full
>>> >>                  list of ZooKeeper quorum servers. If HAMA_MANAGES_ZK
>>> is
>>> >> set in hama-env.sh
>>> >>                  this is the list of servers which we will start/stop
>>> >> ZooKeeper on.
>>> >>                  </description>
>>> >>         </property>
>>> >>        ......
>>> >>        <property>
>>> >>                  <name>hama.zookeeper.property.clientPort</name>
>>> >>                  <value>2181</value>
>>> >>          </property>
>>> >>
>>> >> Is something wrong with my settings ?
>>> >>
>>> >> Regards,
>>> >> Behroz Sikander
>>> >>
>>> >> On Mon, Jun 29, 2015 at 1:44 AM, Edward J. Yoon <
>>> edward.yoon@samsung.com>
>>> >> wrote:
>>> >>
>>> >>> > (0.7.0) because I do not understand YARN yet. It adds extra
>>> >>> configurations
>>> >>>
>>> >>> Hama classic mode works on both Hadoop 1.x and Hadoop 2.x HDFS.
Yarn
>>> >>> configuration is only needed when you want to submit a BSP job to
Yarn
>>> >>> cluster
>>> >>> without Hama cluster. So you don't need to worry about it. :-)
>>> >>>
>>> >>> > distributed mode ? and is there any way to manage the server
? I
>>> mean
>>> >>> right
>>> >>> > now, I have 3 machines with alot of configurations files and
log
>>> files.
>>> >>> It
>>> >>>
>>> >>> You can use web UI at http://masterserver_address:40013/bspmaster.jsp
>>> >>>
>>> >>> To debug your program, please try like below:
>>> >>>
>>> >>> 1) Run a BSPMaster and Zookeeper at server1.
>>> >>> % bin/hama bspmaster
>>> >>> % bin/hama zookeeper
>>> >>>
>>> >>> 2) Run a Groom at server1 and server2.
>>> >>>
>>> >>> % bin/hama groom
>>> >>>
>>> >>> 3) Check whether deamons are running well. Then, run your program
>>> using jar
>>> >>> command at server1.
>>> >>>
>>> >>> % bin/hama jar .....
>>> >>>
>>> >>> > In hama_[user]_bspmaster_.....log file I get the following
>>> exception. But
>>> >>> > this occurs in both cases when I run my job with 3 tasks or
with 4
>>> tasks
>>> >>>
>>> >>> In fact, you should not see above initZK error log.
>>> >>>
>>> >>> --
>>> >>> Best Regards, Edward J. Yoon
>>> >>>
>>> >>>
>>> >>> -----Original Message-----
>>> >>> From: Behroz Sikander [mailto:behroz89@gmail.com]
>>> >>> Sent: Monday, June 29, 2015 8:18 AM
>>> >>> To: user@hama.apache.org
>>> >>> Subject: Re: Groomserer BSPPeerChild limit
>>> >>>
>>> >>> I will try the things that you mentioned. I am not using the latest
>>> version
>>> >>> (0.7.0) because I do not understand YARN yet. It adds extra
>>> configurations
>>> >>> which makes it more harder for me to understand when things go wrong.
>>> Any
>>> >>> suggestions ?
>>> >>>
>>> >>> Further, are there any tools that you use for debugging while in
>>> >>> distributed mode ? and is there any way to manage the server ? I
mean
>>> right
>>> >>> now, I have 3 machines with alot of configurations files and log
>>> files. It
>>> >>> takes alot of time. This makes me wonder how people who have 100s
of
>>> >>> machines debug and manage the cluster.
>>> >>>
>>> >>> Regards,
>>> >>> Behroz
>>> >>>
>>> >>> On Mon, Jun 29, 2015 at 12:53 AM, Edward J. Yoon <
>>> edward.yoon@samsung.com>
>>> >>> wrote:
>>> >>>
>>> >>> > Hi,
>>> >>> >
>>> >>> > It looks like a zookeeper connection problem. Please check
whether
>>> >>> > zookeeper
>>> >>> > is running and every tasks can connect to zookeeper.
>>> >>> >
>>> >>> > I would recommend you to stop the firewall during debugging,
and
>>> please
>>> >>> use
>>> >>> > the 0.7.0 latest release.
>>> >>> >
>>> >>> >
>>> >>> > --
>>> >>> > Best Regards, Edward J. Yoon
>>> >>> >
>>> >>> > -----Original Message-----
>>> >>> > From: Behroz Sikander [mailto:behroz89@gmail.com]
>>> >>> > Sent: Monday, June 29, 2015 7:34 AM
>>> >>> > To: user@hama.apache.org
>>> >>> > Subject: Re: Groomserer BSPPeerChild limit
>>> >>> >
>>> >>> > To figure out the issue, I was trying something else and found
out
>>> >>> another
>>> >>> > wiered issue. Might be a bug of Hama but I am not sure. Both
>>> following
>>> >>> > lines give an exception.
>>> >>> >
>>> >>> > System.out.println( peer.getPeerName(0)); //Exception
>>> >>> >
>>> >>> > System.out.println( peer.getNumPeers()); //Exception
>>> >>> >
>>> >>> >
>>> >>> > [time] ERROR bsp.BSPTask: *Error running bsp setup and bsp
>>> function.*
>>> >>> >
>>> >>> > [time]java.lang.*RuntimeException: All peer names could not
be
>>> >>> retrieved!*
>>> >>> >
>>> >>> > at
>>> >>> >
>>> >>> >
>>> >>>
>>> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.getAllPeerNames(ZooKeeperSyncClientImpl.java:305)
>>> >>> >
>>> >>> > at
>>> org.apache.hama.bsp.BSPPeerImpl.initPeerNames(BSPPeerImpl.java:544)
>>> >>> >
>>> >>> > at org.apache.hama.bsp.BSPPeerImpl.getNumPeers(BSPPeerImpl.java:538)
>>> >>> >
>>> >>> > at testHDFS.EVADMMBsp.setup*(EVADMMBsp.java:58)*
>>> >>> >
>>> >>> > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
>>> >>> >
>>> >>> > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>>> >>> >
>>> >>> > at
>>> >>>
>>> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
>>> >>> >
>>> >>> > On Sun, Jun 28, 2015 at 6:45 PM, Behroz Sikander <
>>> behroz89@gmail.com>
>>> >>> > wrote:
>>> >>> >
>>> >>> > > I think I have more information on the issue. I did some
>>> debugging and
>>> >>> > > found something quite strange.
>>> >>> > >
>>> >>> > > If I open my job with 6 tasks ( 3 tasks will run on MACHINE1
and
>>> 3 task
>>> >>> > > will be opened on other MACHINE2),
>>> >>> > >
>>> >>> > >  -  3 tasks on Machine1 are frozen and the strange thing
is that
>>> the
>>> >>> > > processes do not even enter the SETUP function of BSP
class. I
>>> have
>>> >>> print
>>> >>> > > statements in the setup function of BSP class and it doesn't
print
>>> >>> > > anything. I get empty files with zero size.
>>> >>> > >
>>> >>> > > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:29 .
>>> >>> > > drwxrwxr-x 99 behroz behroz 4096 Jun 28 16:28 ..
>>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>>> >>> > > attempt_201506281624_0001_000000_0.err
>>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>>> >>> > > attempt_201506281624_0001_000000_0.log
>>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>>> >>> > > attempt_201506281624_0001_000001_0.err
>>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>>> >>> > > attempt_201506281624_0001_000001_0.log
>>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>>> >>> > > attempt_201506281624_0001_000002_0.err
>>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
>>> >>> > > attempt_201506281624_0001_000002_0.log
>>> >>> > >
>>> >>> > > - On MACHINE2, the code enters the SETUP function of BSP
class and
>>> >>> prints
>>> >>> > > stuff. See the size of files generated on output. How
is it
>>> possible
>>> >>> that
>>> >>> > > in 3 tasks the code can enter BSP and in others it cannot
?
>>> >>> > >
>>> >>> > > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:39 .
>>> >>> > > drwxrwxr-x 82 behroz behroz 4096 Jun 28 16:39 ..
>>> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
>>> >>> > > attempt_201506281639_0001_000003_0.err
>>> >>> > > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
>>> >>> > > attempt_201506281639_0001_000003_0.log
>>> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
>>> >>> > > attempt_201506281639_0001_000004_0.err
>>> >>> > > -rw-rw-r--  1 behroz behroz 1368 Jun 28 16:39
>>> >>> > > attempt_201506281639_0001_000004_0.log
>>> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
>>> >>> > > attempt_201506281639_0001_000005_0.err
>>> >>> > > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
>>> >>> > > attempt_201506281639_0001_000005_0.log
>>> >>> > >
>>> >>> > > - Hama Groom log file on MACHINE2 (which is frozen) shows.
>>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>>> >>> > > 'attempt_201506281639_0001_000001_0' has started.
>>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3
tasks.
>>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>>> >>> > > 'attempt_201506281639_0001_000002_0' has started.
>>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3
tasks.
>>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>>> >>> > > 'attempt_201506281639_0001_000000_0' has started.
>>> >>> > >
>>> >>> > > - Hama Groom log file on MACHINE2 shows
>>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>>> >>> > > 'attempt_201506281639_0001_000003_0' has started.
>>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3
tasks.
>>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>>> >>> > > 'attempt_201506281639_0001_000004_0' has started.
>>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3
tasks.
>>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>>> >>> > > 'attempt_201506281639_0001_000005_0' has started.
>>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>>> >>> > > attempt_201506281639_0001_000004_0 is *done*.
>>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>>> >>> > > attempt_201506281639_0001_000003_0 is *done*.
>>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
>>> >>> > > attempt_201506281639_0001_000005_0 is *done*.
>>> >>> > >
>>> >>> > > Any clue what might be going wrong ?
>>> >>> > >
>>> >>> > > Regards,
>>> >>> > > Behroz
>>> >>> > >
>>> >>> > >
>>> >>> > >
>>> >>> > > On Sat, Jun 27, 2015 at 1:13 PM, Behroz Sikander <
>>> behroz89@gmail.com>
>>> >>> > > wrote:
>>> >>> > >
>>> >>> > >> Here is the log file from that folder
>>> >>> > >>
>>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: Starting Socket
Reader #1 for
>>> port
>>> >>> > >> 61001
>>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server Responder:
starting
>>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server listener
on 61001:
>>> >>> > starting
>>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler
0 on 61001:
>>> >>> > starting
>>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler
1 on 61001:
>>> >>> > starting
>>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler
2 on 61001:
>>> >>> > starting
>>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler
3 on 61001:
>>> >>> > starting
>>> >>> > >> 15/06/27 11:10:34 INFO message.HamaMessageManagerImpl:
BSPPeer
>>> >>> > >> address:b178b33b16cc port:61001
>>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler
4 on 61001:
>>> >>> > starting
>>> >>> > >> 15/06/27 11:10:34 INFO sync.ZKSyncClient: Initializing
ZK Sync
>>> Client
>>> >>> > >> 15/06/27 11:10:34 INFO sync.ZooKeeperSyncClientImpl:
Start
>>> connecting
>>> >>> to
>>> >>> > >> Zookeeper! At b178b33b16cc/172.17.0.7:61001
>>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping server
on 61001
>>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler
0 on 61001:
>>> >>> > exiting
>>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server
listener
>>> on
>>> >>> 61001
>>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler
1 on 61001:
>>> >>> > exiting
>>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler
2 on 61001:
>>> >>> > exiting
>>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server
Responder
>>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler
3 on 61001:
>>> >>> > exiting
>>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler
4 on 61001:
>>> >>> > exiting
>>> >>> > >>
>>> >>> > >>
>>> >>> > >> And my console shows the following ouptut. Hama is
frozen right
>>> now.
>>> >>> > >> 15/06/27 11:10:32 INFO bsp.BSPJobClient: Running job:
>>> >>> > >> job_201506262331_0003
>>> >>> > >> 15/06/27 11:10:35 INFO bsp.BSPJobClient: Current supersteps
>>> number: 0
>>> >>> > >> 15/06/27 11:10:38 INFO bsp.BSPJobClient: Current supersteps
>>> number: 2
>>> >>> > >>
>>> >>> > >> On Sat, Jun 27, 2015 at 1:07 PM, Edward J. Yoon <
>>> >>> edwardyoon@apache.org>
>>> >>> > >> wrote:
>>> >>> > >>
>>> >>> > >>> Please check the task logs in $HAMA_HOME/logs/tasklogs
folder.
>>> >>> > >>>
>>> >>> > >>> On Sat, Jun 27, 2015 at 8:03 PM, Behroz Sikander
<
>>> behroz89@gmail.com
>>> >>> >
>>> >>> > >>> wrote:
>>> >>> > >>> > Yea. I also thought that. I ran the program
through eclipse
>>> with 20
>>> >>> > >>> tasks
>>> >>> > >>> > and it works fine.
>>> >>> > >>> >
>>> >>> > >>> > On Sat, Jun 27, 2015 at 1:00 PM, Edward J.
Yoon <
>>> >>> > edwardyoon@apache.org
>>> >>> > >>> >
>>> >>> > >>> > wrote:
>>> >>> > >>> >
>>> >>> > >>> >> > When I run the PI example, it uses
9 tasks and runs fine.
>>> When I
>>> >>> > >>> run my
>>> >>> > >>> >> > program with 3 tasks, everything
runs fine. But when I
>>> increase
>>> >>> > the
>>> >>> > >>> tasks
>>> >>> > >>> >> > (to 4) by using "setNumBspTask".
Hama freezes. I do not
>>> >>> understand
>>> >>> > >>> what
>>> >>> > >>> >> can
>>> >>> > >>> >> > go wrong.
>>> >>> > >>> >>
>>> >>> > >>> >> It looks like a program bug. Have you
ran your program in
>>> local
>>> >>> > mode?
>>> >>> > >>> >>
>>> >>> > >>> >> On Sat, Jun 27, 2015 at 8:03 AM, Behroz
Sikander <
>>> >>> > behroz89@gmail.com>
>>> >>> > >>> >> wrote:
>>> >>> > >>> >> > Hi,
>>> >>> > >>> >> > In the current thread, I mentioned
3 issues. Issue 1 and 3
>>> are
>>> >>> > >>> resolved
>>> >>> > >>> >> but
>>> >>> > >>> >> > issue number 2 is still giving me
headaches.
>>> >>> > >>> >> >
>>> >>> > >>> >> > My problem:
>>> >>> > >>> >> > My cluster now consists of 3 machines.
Each one of them
>>> properly
>>> >>> > >>> >> configured
>>> >>> > >>> >> > (Apparently). From my master machine
when I start Hadoop
>>> and
>>> >>> Hama,
>>> >>> > >>> I can
>>> >>> > >>> >> > see the processes started on other
2 machines. If I check
>>> the
>>> >>> > >>> maximum
>>> >>> > >>> >> tasks
>>> >>> > >>> >> > that my cluster can support then
I get 9 (3 tasks on each
>>> >>> > machine).
>>> >>> > >>> >> >
>>> >>> > >>> >> > When I run the PI example, it uses
9 tasks and runs fine.
>>> When I
>>> >>> > >>> run my
>>> >>> > >>> >> > program with 3 tasks, everything
runs fine. But when I
>>> increase
>>> >>> > the
>>> >>> > >>> tasks
>>> >>> > >>> >> > (to 4) by using "setNumBspTask".
Hama freezes. I do not
>>> >>> understand
>>> >>> > >>> what
>>> >>> > >>> >> can
>>> >>> > >>> >> > go wrong.
>>> >>> > >>> >> >
>>> >>> > >>> >> > I checked the logs files and things
look fine. I just
>>> sometimes
>>> >>> > get
>>> >>> > >>> an
>>> >>> > >>> >> > exception that hama was not able
to delete the sytem
>>> directory
>>> >>> > >>> >> > (bsp.system.dir) defined in the
hama-site.xml.
>>> >>> > >>> >> >
>>> >>> > >>> >> > Any help or clue would be great.
>>> >>> > >>> >> >
>>> >>> > >>> >> > Regards,
>>> >>> > >>> >> > Behroz Sikander
>>> >>> > >>> >> >
>>> >>> > >>> >> > On Thu, Jun 25, 2015 at 1:13 PM,
Behroz Sikander <
>>> >>> > >>> behroz89@gmail.com>
>>> >>> > >>> >> wrote:
>>> >>> > >>> >> >
>>> >>> > >>> >> >> Thank you :)
>>> >>> > >>> >> >>
>>> >>> > >>> >> >> On Thu, Jun 25, 2015 at 12:14
AM, Edward J. Yoon <
>>> >>> > >>> edwardyoon@apache.org
>>> >>> > >>> >> >
>>> >>> > >>> >> >> wrote:
>>> >>> > >>> >> >>
>>> >>> > >>> >> >>> Hi,
>>> >>> > >>> >> >>>
>>> >>> > >>> >> >>> You can get the maximum
number of available tasks like
>>> >>> following
>>> >>> > >>> code:
>>> >>> > >>> >> >>>
>>> >>> > >>> >> >>>     BSPJobClient jobClient
= new BSPJobClient(conf);
>>> >>> > >>> >> >>>     ClusterStatus cluster
=
>>> jobClient.getClusterStatus(true);
>>> >>> > >>> >> >>>
>>> >>> > >>> >> >>>     // Set to maximum
>>> >>> > >>> >> >>>     bsp.setNumBspTask(cluster.getMaxTasks());
>>> >>> > >>> >> >>>
>>> >>> > >>> >> >>>
>>> >>> > >>> >> >>> On Wed, Jun 24, 2015 at
11:20 PM, Behroz Sikander <
>>> >>> > >>> behroz89@gmail.com>
>>> >>> > >>> >> >>> wrote:
>>> >>> > >>> >> >>> > Hi,
>>> >>> > >>> >> >>> > 1) Thank you for this.
>>> >>> > >>> >> >>> > 2) Here are the images.
I will look into the log files
>>> of PI
>>> >>> > >>> example
>>> >>> > >>> >> >>> >
>>> >>> > >>> >> >>> > *Result of JPS command
on slave*
>>> >>> > >>> >> >>> >
>>> >>> > >>> >> >>>
>>> >>> > >>> >>
>>> >>> > >>>
>>> >>> >
>>> >>>
>>> http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png
>>> >>> > >>> >> >>> >
>>> >>> > >>> >> >>> > *Result of JPS command
on Master*
>>> >>> > >>> >> >>> >
>>> >>> > >>> >> >>>
>>> >>> > >>> >>
>>> >>> > >>>
>>> >>> >
>>> >>>
>>> http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png
>>> >>> > >>> >> >>> >
>>> >>> > >>> >> >>> > 3) In my current case,
I do not have any input
>>> submitted to
>>> >>> > the
>>> >>> > >>> job.
>>> >>> > >>> >> >>> During
>>> >>> > >>> >> >>> > run time, I directly
fetch data from HDFS. So, I am
>>> looking
>>> >>> > for
>>> >>> > >>> >> >>> something
>>> >>> > >>> >> >>> > like BSPJob.set*Max*NumBspTask().
>>> >>> > >>> >> >>> >
>>> >>> > >>> >> >>> > Regards,
>>> >>> > >>> >> >>> > Behroz
>>> >>> > >>> >> >>> >
>>> >>> > >>> >> >>> >
>>> >>> > >>> >> >>> >
>>> >>> > >>> >> >>> > On Tue, Jun 23, 2015
at 12:57 AM, Edward J. Yoon <
>>> >>> > >>> >> edwardyoon@apache.org
>>> >>> > >>> >> >>> >
>>> >>> > >>> >> >>> > wrote:
>>> >>> > >>> >> >>> >
>>> >>> > >>> >> >>> >> Hello,
>>> >>> > >>> >> >>> >>
>>> >>> > >>> >> >>> >> 1) You can get
the filesystem URI from a configuration
>>> >>> using
>>> >>> > >>> >> >>> >> "FileSystem fs
= FileSystem.get(conf);". Of course,
>>> the
>>> >>> > >>> fs.defaultFS
>>> >>> > >>> >> >>> >> property should
be in hama-site.xml
>>> >>> > >>> >> >>> >>
>>> >>> > >>> >> >>> >>   <property>
>>> >>> > >>> >> >>> >>     <name>fs.defaultFS</name>
>>> >>> > >>> >> >>> >>     <value>hdfs://host1.mydomain.com:9000/</value>
>>> >>> > >>> >> >>> >>     <description>
>>> >>> > >>> >> >>> >>       The name
of the default file system. Either the
>>> >>> literal
>>> >>> > >>> string
>>> >>> > >>> >> >>> >>       "local" or
a host:port for HDFS.
>>> >>> > >>> >> >>> >>     </description>
>>> >>> > >>> >> >>> >>   </property>
>>> >>> > >>> >> >>> >>
>>> >>> > >>> >> >>> >> 2) The 'bsp.tasks.maximum'
is the number of tasks per
>>> node.
>>> >>> > It
>>> >>> > >>> looks
>>> >>> > >>> >> >>> >> cluster configuration
issue. Please run Pi example
>>> and look
>>> >>> > at
>>> >>> > >>> the
>>> >>> > >>> >> >>> >> logs for more details.
NOTE: you can not attach the
>>> images
>>> >>> to
>>> >>> > >>> >> mailing
>>> >>> > >>> >> >>> >> list so I can't
see it.
>>> >>> > >>> >> >>> >>
>>> >>> > >>> >> >>> >> 3) You can use
the BSPJob.setNumBspTask(int) method.
>>> If
>>> >>> input
>>> >>> > >>> is
>>> >>> > >>> >> >>> >> provided, the number
of BSP tasks is basically driven
>>> by
>>> >>> the
>>> >>> > >>> number
>>> >>> > >>> >> of
>>> >>> > >>> >> >>> >> DFS blocks. I'll
fix it to be more flexible on
>>> HAMA-956.
>>> >>> > >>> >> >>> >>
>>> >>> > >>> >> >>> >> Thanks!
>>> >>> > >>> >> >>> >>
>>> >>> > >>> >> >>> >>
>>> >>> > >>> >> >>> >> On Tue, Jun 23,
2015 at 2:33 AM, Behroz Sikander <
>>> >>> > >>> >> behroz89@gmail.com>
>>> >>> > >>> >> >>> >> wrote:
>>> >>> > >>> >> >>> >> > Hi,
>>> >>> > >>> >> >>> >> > Recently,
I moved from a single machine setup to a 2
>>> >>> > machine
>>> >>> > >>> >> setup.
>>> >>> > >>> >> >>> I was
>>> >>> > >>> >> >>> >> > successfully
able to run my job that uses the HDFS
>>> to get
>>> >>> > >>> data. I
>>> >>> > >>> >> >>> have 3
>>> >>> > >>> >> >>> >> > trivial questions
>>> >>> > >>> >> >>> >> >
>>> >>> > >>> >> >>> >> > 1- To access
HDFS, I have to manually give the IP
>>> address
>>> >>> > of
>>> >>> > >>> >> server
>>> >>> > >>> >> >>> >> running
>>> >>> > >>> >> >>> >> > HDFS. I thought
that Hama will automatically pick
>>> from
>>> >>> the
>>> >>> > >>> >> >>> configurations
>>> >>> > >>> >> >>> >> > but it does
not. I am probably doing something
>>> wrong.
>>> >>> Right
>>> >>> > >>> now my
>>> >>> > >>> >> >>> code
>>> >>> > >>> >> >>> >> work
>>> >>> > >>> >> >>> >> > by using the
following.
>>> >>> > >>> >> >>> >> >
>>> >>> > >>> >> >>> >> > FileSystem
fs = FileSystem.get(new
>>> >>> > >>> URI("hdfs://server_ip:port/"),
>>> >>> > >>> >> >>> conf);
>>> >>> > >>> >> >>> >> >
>>> >>> > >>> >> >>> >> > 2- On my master
server, when I start hama it
>>> >>> automatically
>>> >>> > >>> starts
>>> >>> > >>> >> >>> hama in
>>> >>> > >>> >> >>> >> > the slave
machine (all good). Both master and slave
>>> are
>>> >>> set
>>> >>> > >>> as
>>> >>> > >>> >> >>> >> groomservers.
>>> >>> > >>> >> >>> >> > This means
that I have 2 servers to run my job which
>>> >>> means
>>> >>> > >>> that I
>>> >>> > >>> >> can
>>> >>> > >>> >> >>> >> open
>>> >>> > >>> >> >>> >> > more BSPPeerChild
processes. And if I submit my jar
>>> with
>>> >>> 3
>>> >>> > >>> bsp
>>> >>> > >>> >> tasks
>>> >>> > >>> >> >>> then
>>> >>> > >>> >> >>> >> > everything
works fine. But when I move to 4 tasks,
>>> Hama
>>> >>> > >>> freezes.
>>> >>> > >>> >> >>> Here is
>>> >>> > >>> >> >>> >> the
>>> >>> > >>> >> >>> >> > result of
JPS command on slave.
>>> >>> > >>> >> >>> >> >
>>> >>> > >>> >> >>> >> >
>>> >>> > >>> >> >>> >> > Result of
JPS command on Master
>>> >>> > >>> >> >>> >> >
>>> >>> > >>> >> >>> >> >
>>> >>> > >>> >> >>> >> >
>>> >>> > >>> >> >>> >> > You can see
that it is only opening tasks on slaves
>>> but
>>> >>> not
>>> >>> > >>> on
>>> >>> > >>> >> >>> master.
>>> >>> > >>> >> >>> >> >
>>> >>> > >>> >> >>> >> > Note: I tried
to change the bsp.tasks.maximum
>>> property in
>>> >>> > >>> >> >>> >> hama-default.xml
>>> >>> > >>> >> >>> >> > to 4 but still
same result.
>>> >>> > >>> >> >>> >> >
>>> >>> > >>> >> >>> >> > 3- I want
my cluster to open as many BSPPeerChild
>>> >>> processes
>>> >>> > >>> as
>>> >>> > >>> >> >>> possible.
>>> >>> > >>> >> >>> >> Is
>>> >>> > >>> >> >>> >> > there any
setting that can I do to achieve that ?
>>> Or hama
>>> >>> > >>> picks up
>>> >>> > >>> >> >>> the
>>> >>> > >>> >> >>> >> > values from
hama-default.xml to open tasks ?
>>> >>> > >>> >> >>> >> >
>>> >>> > >>> >> >>> >> >
>>> >>> > >>> >> >>> >> > Regards,
>>> >>> > >>> >> >>> >> >
>>> >>> > >>> >> >>> >> > Behroz Sikander
>>> >>> > >>> >> >>> >>
>>> >>> > >>> >> >>> >>
>>> >>> > >>> >> >>> >>
>>> >>> > >>> >> >>> >> --
>>> >>> > >>> >> >>> >> Best Regards, Edward
J. Yoon
>>> >>> > >>> >> >>> >>
>>> >>> > >>> >> >>>
>>> >>> > >>> >> >>>
>>> >>> > >>> >> >>>
>>> >>> > >>> >> >>> --
>>> >>> > >>> >> >>> Best Regards, Edward J.
Yoon
>>> >>> > >>> >> >>>
>>> >>> > >>> >> >>
>>> >>> > >>> >> >>
>>> >>> > >>> >>
>>> >>> > >>> >>
>>> >>> > >>> >>
>>> >>> > >>> >> --
>>> >>> > >>> >> Best Regards, Edward J. Yoon
>>> >>> > >>> >>
>>> >>> > >>>
>>> >>> > >>>
>>> >>> > >>>
>>> >>> > >>> --
>>> >>> > >>> Best Regards, Edward J. Yoon
>>> >>> > >>>
>>> >>> > >>
>>> >>> > >>
>>> >>> > >
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>>
>>> >>>
>>> >>>
>>> >
>>> >
>>> >
>>> > --
>>> > Best Regards, Edward J. Yoon
>>>
>>>
>>>
>>> --
>>> Best Regards, Edward J. Yoon
>>>
>>
>>



-- 
Best Regards, Edward J. Yoon

Mime
View raw message