hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edward.y...@samsung.com>
Subject RE: Groomserer BSPPeerChild limit
Date Mon, 03 Aug 2015 22:39:40 GMT
Hi,

If you need to kill the single JVM manually, then your program has an infinite 
loop and it's a local BSP job.

--
Best Regards, Edward J. Yoon

-----Original Message-----
From: Behroz Sikander [mailto:behroz89@gmail.com]
Sent: Monday, August 03, 2015 7:39 PM
To: user@hama.apache.org
Subject: Re: Groomserer BSPPeerChild limit

I tried bin/stop-bspd.sh but the output of script says that no
groom/bspmaster process. Then I have to kill them manually. I am working on
Hama 0.7.0

On Mon, Aug 3, 2015 at 1:07 AM, Edward J. Yoon <edward.yoon@samsung.com>
wrote:

> Hi,
>
> Congratz! You can shutdown the cluster with following command: $
> bin/stop-bspd.sh
>
> --
> Best Regards, Edward J. Yoon
>
> -----Original Message-----
> From: Behroz Sikander [mailto:behroz89@gmail.com]
> Sent: Sunday, August 02, 2015 11:27 PM
> To: user@hama.apache.org
> Subject: Re: Groomserer BSPPeerChild limit
>
> Hi,
> Last day, I got the fix for /etc/hosts file and now I can modify it. I
> tried to run  the cluster with 3 machines and everything went super fine.
>
> Thanks :)
>
> btw if I run a process using the following. How can I stop it ? Right now I
> am using kill -9 <process_id>
> % ./bin/hama bspmaster
>
> On Mon, Jun 29, 2015 at 5:53 AM, Behroz Sikander <behroz89@gmail.com>
> wrote:
>
> > Ok perfect. I do not have rights on /etc/hosts so that's why I was using
> > the IP addresses. I will talk to the administrator.
> >
> > Btw I am wondering, how PI example was able to communicate with the other
> > servers. PI examples runs fine even if I have tasks more than 3 (works on
> > both machines).
> >
> > On Mon, Jun 29, 2015 at 5:47 AM, Edward J. Yoon <edwardyoon@apache.org>
> > wrote:
> >
> >> OKay almost done. I guess you need to add host names to your
> >> /etc/hosts file. :-) Please see also
> >>
> >>
> http://stackoverflow.com/questions/4730148/unknownhostexception-on-tasktracker-in-hadoop-cluster
> >>
> >> On Mon, Jun 29, 2015 at 12:41 PM, Behroz Sikander <behroz89@gmail.com>
> >> wrote:
> >> > Server 2 was showing the exception that I posted in the previous
> email.
> >> > Server1 is showing the following exception
> >> >
> >> > 15/06/29 03:27:42 INFO ipc.Server: IPC Server handler 0 on 40000:
> >> starting
> >> > 15/06/29 03:28:53 INFO bsp.BSPMaster: groomd_b178b33b16cc_50000 is
> >> added.
> >> > 15/06/29 03:29:20 ERROR bsp.BSPMaster: Fail to register GroomServer
> >> > groomd_8d4b512cf448_50000
> >> > java.net.UnknownHostException: unknown host: 8d4b512cf448
> >> > at org.apache.hama.ipc.Client$Connection.<init>(Client.java:225)
> >> > at org.apache.hama.ipc.Client.getConnection(Client.java:1039)
> >> > at org.apache.hama.ipc.Client.call(Client.java:888)
> >> > at org.apache.hama.ipc.RPC$Invoker.invoke(RPC.java:239)
> >> > at com.sun.proxy.$Proxy11.getProtocolVersion(Unknown Source)
> >> >
> >> > I am looking into this issue.
> >> >
> >> > On Mon, Jun 29, 2015 at 5:31 AM, Behroz Sikander <behroz89@gmail.com>
> >> wrote:
> >> >
> >> >> Ok great. I was able to run the zk, groom and bspmaster on server 1.
> >> But
> >> >> when I ran the groom on server2 I got the following exception
> >> >>
> >> >> 15/06/29 03:29:20 ERROR bsp.GroomServer: There is a problem in
> >> >> establishing communication link with BSPMaster
> >> >> 15/06/29 03:29:20 ERROR bsp.GroomServer: Got fatal exception while
> >> >> reinitializing GroomServer: java.io.IOException: There is a problem
> in
> >> >> establishing communication link with BSPMaster.
> >> >> at org.apache.hama.bsp.GroomServer.initialize(GroomServer.java:426)
> >> >> at org.apache.hama.bsp.GroomServer.run(GroomServer.java:860)
> >> >> at java.lang.Thread.run(Thread.java:745)
> >> >>
> >> >> On Mon, Jun 29, 2015 at 5:21 AM, Edward J. Yoon <
> edwardyoon@apache.org
> >> >
> >> >> wrote:
> >> >>
> >> >>> Here's my configurations:
> >> >>>
> >> >>> hama-site.xml:
> >> >>>
> >> >>>   <property>
> >> >>>     <name>bsp.master.address</name>
> >> >>>     <value>cluster-0:40000</value>
> >> >>>   </property>
> >> >>>
> >> >>>   <property>
> >> >>>     <name>fs.default.name</name>
> >> >>>     <value>hdfs://cluster-0:9000/</value>
> >> >>>   </property>
> >> >>>
> >> >>>   <property>
> >> >>>     <name>hama.zookeeper.quorum</name>
> >> >>>     <value>cluster-0</value>
> >> >>>   </property>
> >> >>>
> >> >>>
> >> >>> % bin/hama zookeeper
> >> >>> 15/06/29 12:17:17 ERROR quorum.QuorumPeerConfig: Invalid
> >> >>> configuration, only one server specified (ignoring)
> >> >>>
> >> >>> Then, open new terminal and run master with following command:
> >> >>>
> >> >>> % bin/hama bspmaster
> >> >>> ...
> >> >>> 15/06/29 12:17:40 INFO sync.ZKSyncBSPMasterClient: Initialized ZK
> >> false
> >> >>> 15/06/29 12:17:40 INFO sync.ZKSyncClient: Initializing ZK Sync
> Client
> >> >>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server Responder: starting
> >> >>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server listener on 40000:
> >> starting
> >> >>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server handler 0 on 40000:
> >> starting
> >> >>> 15/06/29 12:17:40 INFO bsp.BSPMaster: Starting RUNNING
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Mon, Jun 29, 2015 at 12:17 PM, Edward J. Yoon <
> >> edwardyoon@apache.org>
> >> >>> wrote:
> >> >>> > Hi,
> >> >>> >
> >> >>> > If you run zk server too, BSPmaster will be connected to zk and
> >> won't
> >> >>> > throw exceptions.
> >> >>> >
> >> >>> > On Mon, Jun 29, 2015 at 12:13 PM, Behroz Sikander <
> >> behroz89@gmail.com>
> >> >>> wrote:
> >> >>> >> Hi,
> >> >>> >> Thank you the information. I moved to hama 0.7.0 and I still have
> >> the
> >> >>> same
> >> >>> >> problem.
> >> >>> >> When I run % bin/hama bspmaster, I am getting the following
> >> exception
> >> >>> >>
> >> >>> >> INFO http.HttpServer: Port returned by
> >> >>> >> webServer.getConnectors()[0].getLocalPort() before open() is -1.
> >> >>> Opening
> >> >>> >> the listener on 40013
> >> >>> >>  INFO http.HttpServer: listener.getLocalPort() returned 40013
> >> >>> >> webServer.getConnectors()[0].getLocalPort() returned 40013
> >> >>> >>  INFO http.HttpServer: Jetty bound to port 40013
> >> >>> >>  INFO mortbay.log: jetty-6.1.14
> >> >>> >>  INFO mortbay.log: Extract
> >> >>> >>
> >> >>>
> >>
> jar:file:/home/behroz/Documents/Packages/hama-0.7.0/hama-core-0.7.0.jar!/webapp/bspmaster/
> >> >>> >> to /tmp/Jetty_b178b33b16cc_40013_bspmaster____.cof30w/webapp
> >> >>> >>  INFO mortbay.log: Started SelectChannelConnector@b178b33b16cc
> >> :40013
> >> >>> >>  INFO bsp.BSPMaster: Cleaning up the system directory
> >> >>> >>  INFO bsp.BSPMaster: hdfs://
> >> >>> 172.17.0.3:54310/tmp/hama-behroz/bsp/system
> >> >>> >>  INFO sync.ZKSyncBSPMasterClient: Initialized ZK false
> >> >>> >>  INFO sync.ZKSyncClient: Initializing ZK Sync Client
> >> >>> >>  ERROR sync.ZKSyncBSPMasterClient:
> >> >>> >> org.apache.zookeeper.KeeperException$ConnectionLossException:
> >> >>> >> KeeperErrorCode = ConnectionLoss for /bsp
> >> >>> >> at
> >> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> >> >>> >> at
> >> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> >> >>> >> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
> >> >>> >> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
> >> >>> >> at
> >> >>> >>
> >> >>>
> >>
> org.apache.hama.bsp.sync.ZKSyncBSPMasterClient.init(ZKSyncBSPMasterClient.java:62)
> >> >>> >> at org.apache.hama.bsp.BSPMaster.initZK(BSPMaster.java:534)
> >> >>> >> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:517)
> >> >>> >> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:500)
> >> >>> >> at org.apache.hama.BSPMasterRunner.run(BSPMasterRunner.java:46)
> >> >>> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >> >>> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> >> >>> >> at org.apache.hama.BSPMasterRunner.main(BSPMasterRunner.java:56)
> >> >>> >>  ERROR sync.ZKSyncBSPMasterClient:
> >> >>> >> org.apache.zookeeper.KeeperException$ConnectionLossException:
> >> >>> >> KeeperErrorCode = ConnectionLoss for /bsp
> >> >>> >>
> >> >>> >> *Why zookeeper settings in hama-site.xml are (right now, I am
> using
> >> >>> just
> >> >>> >> two servers 172.17.0.3 and 172.17.0.7)*
> >> >>> >> <property>
> >> >>> >>                  <name>hama.zookeeper.quorum</name>
> >> >>> >>                  <value>172.17.0.3,172.17.0.7</value>
> >> >>> >>                  <description>Comma separated list of servers in
> >> the
> >> >>> >> ZooKeeper quorum.
> >> >>> >>                  For example, "host1.mydomain.com,
> >> host2.mydomain.com,
> >> >>> >> host3.mydomain.com".
> >> >>> >>                  By default this is set to localhost for local
> and
> >> >>> >> pseudo-distributed modes
> >> >>> >>                  of operation. For a fully-distributed setup,
> this
> >> >>> should
> >> >>> >> be set to a full
> >> >>> >>                  list of ZooKeeper quorum servers. If
> >> HAMA_MANAGES_ZK
> >> >>> is
> >> >>> >> set in hama-env.sh
> >> >>> >>                  this is the list of servers which we will
> >> start/stop
> >> >>> >> ZooKeeper on.
> >> >>> >>                  </description>
> >> >>> >>         </property>
> >> >>> >>        ......
> >> >>> >>        <property>
> >> >>> >>                  <name>hama.zookeeper.property.clientPort</name>
> >> >>> >>                  <value>2181</value>
> >> >>> >>          </property>
> >> >>> >>
> >> >>> >> Is something wrong with my settings ?
> >> >>> >>
> >> >>> >> Regards,
> >> >>> >> Behroz Sikander
> >> >>> >>
> >> >>> >> On Mon, Jun 29, 2015 at 1:44 AM, Edward J. Yoon <
> >> >>> edward.yoon@samsung.com>
> >> >>> >> wrote:
> >> >>> >>
> >> >>> >>> > (0.7.0) because I do not understand YARN yet. It adds extra
> >> >>> >>> configurations
> >> >>> >>>
> >> >>> >>> Hama classic mode works on both Hadoop 1.x and Hadoop 2.x HDFS.
> >> Yarn
> >> >>> >>> configuration is only needed when you want to submit a BSP job
> to
> >> Yarn
> >> >>> >>> cluster
> >> >>> >>> without Hama cluster. So you don't need to worry about it. :-)
> >> >>> >>>
> >> >>> >>> > distributed mode ? and is there any way to manage the server
> ? I
> >> >>> mean
> >> >>> >>> right
> >> >>> >>> > now, I have 3 machines with alot of configurations files and
> log
> >> >>> files.
> >> >>> >>> It
> >> >>> >>>
> >> >>> >>> You can use web UI at
> >> http://masterserver_address:40013/bspmaster.jsp
> >> >>> >>>
> >> >>> >>> To debug your program, please try like below:
> >> >>> >>>
> >> >>> >>> 1) Run a BSPMaster and Zookeeper at server1.
> >> >>> >>> % bin/hama bspmaster
> >> >>> >>> % bin/hama zookeeper
> >> >>> >>>
> >> >>> >>> 2) Run a Groom at server1 and server2.
> >> >>> >>>
> >> >>> >>> % bin/hama groom
> >> >>> >>>
> >> >>> >>> 3) Check whether deamons are running well. Then, run your
> program
> >> >>> using jar
> >> >>> >>> command at server1.
> >> >>> >>>
> >> >>> >>> % bin/hama jar .....
> >> >>> >>>
> >> >>> >>> > In hama_[user]_bspmaster_.....log file I get the following
> >> >>> exception. But
> >> >>> >>> > this occurs in both cases when I run my job with 3 tasks or
> >> with 4
> >> >>> tasks
> >> >>> >>>
> >> >>> >>> In fact, you should not see above initZK error log.
> >> >>> >>>
> >> >>> >>> --
> >> >>> >>> Best Regards, Edward J. Yoon
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> -----Original Message-----
> >> >>> >>> From: Behroz Sikander [mailto:behroz89@gmail.com]
> >> >>> >>> Sent: Monday, June 29, 2015 8:18 AM
> >> >>> >>> To: user@hama.apache.org
> >> >>> >>> Subject: Re: Groomserer BSPPeerChild limit
> >> >>> >>>
> >> >>> >>> I will try the things that you mentioned. I am not using the
> >> latest
> >> >>> version
> >> >>> >>> (0.7.0) because I do not understand YARN yet. It adds extra
> >> >>> configurations
> >> >>> >>> which makes it more harder for me to understand when things go
> >> wrong.
> >> >>> Any
> >> >>> >>> suggestions ?
> >> >>> >>>
> >> >>> >>> Further, are there any tools that you use for debugging while in
> >> >>> >>> distributed mode ? and is there any way to manage the server ? I
> >> mean
> >> >>> right
> >> >>> >>> now, I have 3 machines with alot of configurations files and log
> >> >>> files. It
> >> >>> >>> takes alot of time. This makes me wonder how people who have
> 100s
> >> of
> >> >>> >>> machines debug and manage the cluster.
> >> >>> >>>
> >> >>> >>> Regards,
> >> >>> >>> Behroz
> >> >>> >>>
> >> >>> >>> On Mon, Jun 29, 2015 at 12:53 AM, Edward J. Yoon <
> >> >>> edward.yoon@samsung.com>
> >> >>> >>> wrote:
> >> >>> >>>
> >> >>> >>> > Hi,
> >> >>> >>> >
> >> >>> >>> > It looks like a zookeeper connection problem. Please check
> >> whether
> >> >>> >>> > zookeeper
> >> >>> >>> > is running and every tasks can connect to zookeeper.
> >> >>> >>> >
> >> >>> >>> > I would recommend you to stop the firewall during debugging,
> and
> >> >>> please
> >> >>> >>> use
> >> >>> >>> > the 0.7.0 latest release.
> >> >>> >>> >
> >> >>> >>> >
> >> >>> >>> > --
> >> >>> >>> > Best Regards, Edward J. Yoon
> >> >>> >>> >
> >> >>> >>> > -----Original Message-----
> >> >>> >>> > From: Behroz Sikander [mailto:behroz89@gmail.com]
> >> >>> >>> > Sent: Monday, June 29, 2015 7:34 AM
> >> >>> >>> > To: user@hama.apache.org
> >> >>> >>> > Subject: Re: Groomserer BSPPeerChild limit
> >> >>> >>> >
> >> >>> >>> > To figure out the issue, I was trying something else and found
> >> out
> >> >>> >>> another
> >> >>> >>> > wiered issue. Might be a bug of Hama but I am not sure. Both
> >> >>> following
> >> >>> >>> > lines give an exception.
> >> >>> >>> >
> >> >>> >>> > System.out.println( peer.getPeerName(0)); //Exception
> >> >>> >>> >
> >> >>> >>> > System.out.println( peer.getNumPeers()); //Exception
> >> >>> >>> >
> >> >>> >>> >
> >> >>> >>> > [time] ERROR bsp.BSPTask: *Error running bsp setup and bsp
> >> >>> function.*
> >> >>> >>> >
> >> >>> >>> > [time]java.lang.*RuntimeException: All peer names could not be
> >> >>> >>> retrieved!*
> >> >>> >>> >
> >> >>> >>> > at
> >> >>> >>> >
> >> >>> >>> >
> >> >>> >>>
> >> >>>
> >>
> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.getAllPeerNames(ZooKeeperSyncClientImpl.java:305)
> >> >>> >>> >
> >> >>> >>> > at
> >> >>> org.apache.hama.bsp.BSPPeerImpl.initPeerNames(BSPPeerImpl.java:544)
> >> >>> >>> >
> >> >>> >>> > at
> >> org.apache.hama.bsp.BSPPeerImpl.getNumPeers(BSPPeerImpl.java:538)
> >> >>> >>> >
> >> >>> >>> > at testHDFS.EVADMMBsp.setup*(EVADMMBsp.java:58)*
> >> >>> >>> >
> >> >>> >>> > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
> >> >>> >>> >
> >> >>> >>> > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
> >> >>> >>> >
> >> >>> >>> > at
> >> >>> >>>
> >> >>>
> >> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
> >> >>> >>> >
> >> >>> >>> > On Sun, Jun 28, 2015 at 6:45 PM, Behroz Sikander <
> >> >>> behroz89@gmail.com>
> >> >>> >>> > wrote:
> >> >>> >>> >
> >> >>> >>> > > I think I have more information on the issue. I did some
> >> >>> debugging and
> >> >>> >>> > > found something quite strange.
> >> >>> >>> > >
> >> >>> >>> > > If I open my job with 6 tasks ( 3 tasks will run on MACHINE1
> >> and
> >> >>> 3 task
> >> >>> >>> > > will be opened on other MACHINE2),
> >> >>> >>> > >
> >> >>> >>> > >  -  3 tasks on Machine1 are frozen and the strange thing is
> >> that
> >> >>> the
> >> >>> >>> > > processes do not even enter the SETUP function of BSP
> class. I
> >> >>> have
> >> >>> >>> print
> >> >>> >>> > > statements in the setup function of BSP class and it doesn't
> >> print
> >> >>> >>> > > anything. I get empty files with zero size.
> >> >>> >>> > >
> >> >>> >>> > > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:29 .
> >> >>> >>> > > drwxrwxr-x 99 behroz behroz 4096 Jun 28 16:28 ..
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >> >>> >>> > > attempt_201506281624_0001_000000_0.err
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >> >>> >>> > > attempt_201506281624_0001_000000_0.log
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >> >>> >>> > > attempt_201506281624_0001_000001_0.err
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >> >>> >>> > > attempt_201506281624_0001_000001_0.log
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >> >>> >>> > > attempt_201506281624_0001_000002_0.err
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz    0 Jun 28 16:24
> >> >>> >>> > > attempt_201506281624_0001_000002_0.log
> >> >>> >>> > >
> >> >>> >>> > > - On MACHINE2, the code enters the SETUP function of BSP
> >> class and
> >> >>> >>> prints
> >> >>> >>> > > stuff. See the size of files generated on output. How is it
> >> >>> possible
> >> >>> >>> that
> >> >>> >>> > > in 3 tasks the code can enter BSP and in others it cannot ?
> >> >>> >>> > >
> >> >>> >>> > > drwxrwxr-x  2 behroz behroz 4096 Jun 28 16:39 .
> >> >>> >>> > > drwxrwxr-x 82 behroz behroz 4096 Jun 28 16:39 ..
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> >> >>> >>> > > attempt_201506281639_0001_000003_0.err
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
> >> >>> >>> > > attempt_201506281639_0001_000003_0.log
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> >> >>> >>> > > attempt_201506281639_0001_000004_0.err
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz 1368 Jun 28 16:39
> >> >>> >>> > > attempt_201506281639_0001_000004_0.log
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz  659 Jun 28 16:39
> >> >>> >>> > > attempt_201506281639_0001_000005_0.err
> >> >>> >>> > > -rw-rw-r--  1 behroz behroz 1441 Jun 28 16:39
> >> >>> >>> > > attempt_201506281639_0001_000005_0.log
> >> >>> >>> > >
> >> >>> >>> > > - Hama Groom log file on MACHINE2 (which is frozen) shows.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >> >>> >>> > > 'attempt_201506281639_0001_000001_0' has started.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >> >>> >>> > > 'attempt_201506281639_0001_000002_0' has started.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >> >>> >>> > > 'attempt_201506281639_0001_000000_0' has started.
> >> >>> >>> > >
> >> >>> >>> > > - Hama Groom log file on MACHINE2 shows
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >> >>> >>> > > 'attempt_201506281639_0001_000003_0' has started.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >> >>> >>> > > 'attempt_201506281639_0001_000004_0' has started.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >> >>> >>> > > 'attempt_201506281639_0001_000005_0' has started.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >> >>> >>> > > attempt_201506281639_0001_000004_0 is *done*.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >> >>> >>> > > attempt_201506281639_0001_000003_0 is *done*.
> >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task
> >> >>> >>> > > attempt_201506281639_0001_000005_0 is *done*.
> >> >>> >>> > >
> >> >>> >>> > > Any clue what might be going wrong ?
> >> >>> >>> > >
> >> >>> >>> > > Regards,
> >> >>> >>> > > Behroz
> >> >>> >>> > >
> >> >>> >>> > >
> >> >>> >>> > >
> >> >>> >>> > > On Sat, Jun 27, 2015 at 1:13 PM, Behroz Sikander <
> >> >>> behroz89@gmail.com>
> >> >>> >>> > > wrote:
> >> >>> >>> > >
> >> >>> >>> > >> Here is the log file from that folder
> >> >>> >>> > >>
> >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: Starting Socket Reader
> #1
> >> for
> >> >>> port
> >> >>> >>> > >> 61001
> >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server Responder:
> >> starting
> >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server listener on
> >> 61001:
> >> >>> >>> > starting
> >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 0 on
> >> 61001:
> >> >>> >>> > starting
> >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 1 on
> >> 61001:
> >> >>> >>> > starting
> >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 2 on
> >> 61001:
> >> >>> >>> > starting
> >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 3 on
> >> 61001:
> >> >>> >>> > starting
> >> >>> >>> > >> 15/06/27 11:10:34 INFO message.HamaMessageManagerImpl:
> >> BSPPeer
> >> >>> >>> > >> address:b178b33b16cc port:61001
> >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 4 on
> >> 61001:
> >> >>> >>> > starting
> >> >>> >>> > >> 15/06/27 11:10:34 INFO sync.ZKSyncClient: Initializing ZK
> >> Sync
> >> >>> Client
> >> >>> >>> > >> 15/06/27 11:10:34 INFO sync.ZooKeeperSyncClientImpl: Start
> >> >>> connecting
> >> >>> >>> to
> >> >>> >>> > >> Zookeeper! At b178b33b16cc/172.17.0.7:61001
> >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping server on 61001
> >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 0 on
> >> 61001:
> >> >>> >>> > exiting
> >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server
> >> listener
> >> >>> on
> >> >>> >>> 61001
> >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 1 on
> >> 61001:
> >> >>> >>> > exiting
> >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 2 on
> >> 61001:
> >> >>> >>> > exiting
> >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server
> >> Responder
> >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 3 on
> >> 61001:
> >> >>> >>> > exiting
> >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 4 on
> >> 61001:
> >> >>> >>> > exiting
> >> >>> >>> > >>
> >> >>> >>> > >>
> >> >>> >>> > >> And my console shows the following ouptut. Hama is frozen
> >> right
> >> >>> now.
> >> >>> >>> > >> 15/06/27 11:10:32 INFO bsp.BSPJobClient: Running job:
> >> >>> >>> > >> job_201506262331_0003
> >> >>> >>> > >> 15/06/27 11:10:35 INFO bsp.BSPJobClient: Current supersteps
> >> >>> number: 0
> >> >>> >>> > >> 15/06/27 11:10:38 INFO bsp.BSPJobClient: Current supersteps
> >> >>> number: 2
> >> >>> >>> > >>
> >> >>> >>> > >> On Sat, Jun 27, 2015 at 1:07 PM, Edward J. Yoon <
> >> >>> >>> edwardyoon@apache.org>
> >> >>> >>> > >> wrote:
> >> >>> >>> > >>
> >> >>> >>> > >>> Please check the task logs in $HAMA_HOME/logs/tasklogs
> >> folder.
> >> >>> >>> > >>>
> >> >>> >>> > >>> On Sat, Jun 27, 2015 at 8:03 PM, Behroz Sikander <
> >> >>> behroz89@gmail.com
> >> >>> >>> >
> >> >>> >>> > >>> wrote:
> >> >>> >>> > >>> > Yea. I also thought that. I ran the program through
> >> eclipse
> >> >>> with 20
> >> >>> >>> > >>> tasks
> >> >>> >>> > >>> > and it works fine.
> >> >>> >>> > >>> >
> >> >>> >>> > >>> > On Sat, Jun 27, 2015 at 1:00 PM, Edward J. Yoon <
> >> >>> >>> > edwardyoon@apache.org
> >> >>> >>> > >>> >
> >> >>> >>> > >>> > wrote:
> >> >>> >>> > >>> >
> >> >>> >>> > >>> >> > When I run the PI example, it uses 9 tasks and runs
> >> fine.
> >> >>> When I
> >> >>> >>> > >>> run my
> >> >>> >>> > >>> >> > program with 3 tasks, everything runs fine. But when
> I
> >> >>> increase
> >> >>> >>> > the
> >> >>> >>> > >>> tasks
> >> >>> >>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do
> not
> >> >>> >>> understand
> >> >>> >>> > >>> what
> >> >>> >>> > >>> >> can
> >> >>> >>> > >>> >> > go wrong.
> >> >>> >>> > >>> >>
> >> >>> >>> > >>> >> It looks like a program bug. Have you ran your program
> in
> >> >>> local
> >> >>> >>> > mode?
> >> >>> >>> > >>> >>
> >> >>> >>> > >>> >> On Sat, Jun 27, 2015 at 8:03 AM, Behroz Sikander <
> >> >>> >>> > behroz89@gmail.com>
> >> >>> >>> > >>> >> wrote:
> >> >>> >>> > >>> >> > Hi,
> >> >>> >>> > >>> >> > In the current thread, I mentioned 3 issues. Issue 1
> >> and 3
> >> >>> are
> >> >>> >>> > >>> resolved
> >> >>> >>> > >>> >> but
> >> >>> >>> > >>> >> > issue number 2 is still giving me headaches.
> >> >>> >>> > >>> >> >
> >> >>> >>> > >>> >> > My problem:
> >> >>> >>> > >>> >> > My cluster now consists of 3 machines. Each one of
> them
> >> >>> properly
> >> >>> >>> > >>> >> configured
> >> >>> >>> > >>> >> > (Apparently). From my master machine when I start
> >> Hadoop
> >> >>> and
> >> >>> >>> Hama,
> >> >>> >>> > >>> I can
> >> >>> >>> > >>> >> > see the processes started on other 2 machines. If I
> >> check
> >> >>> the
> >> >>> >>> > >>> maximum
> >> >>> >>> > >>> >> tasks
> >> >>> >>> > >>> >> > that my cluster can support then I get 9 (3 tasks on
> >> each
> >> >>> >>> > machine).
> >> >>> >>> > >>> >> >
> >> >>> >>> > >>> >> > When I run the PI example, it uses 9 tasks and runs
> >> fine.
> >> >>> When I
> >> >>> >>> > >>> run my
> >> >>> >>> > >>> >> > program with 3 tasks, everything runs fine. But when
> I
> >> >>> increase
> >> >>> >>> > the
> >> >>> >>> > >>> tasks
> >> >>> >>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do
> not
> >> >>> >>> understand
> >> >>> >>> > >>> what
> >> >>> >>> > >>> >> can
> >> >>> >>> > >>> >> > go wrong.
> >> >>> >>> > >>> >> >
> >> >>> >>> > >>> >> > I checked the logs files and things look fine. I just
> >> >>> sometimes
> >> >>> >>> > get
> >> >>> >>> > >>> an
> >> >>> >>> > >>> >> > exception that hama was not able to delete the sytem
> >> >>> directory
> >> >>> >>> > >>> >> > (bsp.system.dir) defined in the hama-site.xml.
> >> >>> >>> > >>> >> >
> >> >>> >>> > >>> >> > Any help or clue would be great.
> >> >>> >>> > >>> >> >
> >> >>> >>> > >>> >> > Regards,
> >> >>> >>> > >>> >> > Behroz Sikander
> >> >>> >>> > >>> >> >
> >> >>> >>> > >>> >> > On Thu, Jun 25, 2015 at 1:13 PM, Behroz Sikander <
> >> >>> >>> > >>> behroz89@gmail.com>
> >> >>> >>> > >>> >> wrote:
> >> >>> >>> > >>> >> >
> >> >>> >>> > >>> >> >> Thank you :)
> >> >>> >>> > >>> >> >>
> >> >>> >>> > >>> >> >> On Thu, Jun 25, 2015 at 12:14 AM, Edward J. Yoon <
> >> >>> >>> > >>> edwardyoon@apache.org
> >> >>> >>> > >>> >> >
> >> >>> >>> > >>> >> >> wrote:
> >> >>> >>> > >>> >> >>
> >> >>> >>> > >>> >> >>> Hi,
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >> >>> You can get the maximum number of available tasks
> >> like
> >> >>> >>> following
> >> >>> >>> > >>> code:
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >> >>>     BSPJobClient jobClient = new
> BSPJobClient(conf);
> >> >>> >>> > >>> >> >>>     ClusterStatus cluster =
> >> >>> jobClient.getClusterStatus(true);
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >> >>>     // Set to maximum
> >> >>> >>> > >>> >> >>>     bsp.setNumBspTask(cluster.getMaxTasks());
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >> >>> On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander <
> >> >>> >>> > >>> behroz89@gmail.com>
> >> >>> >>> > >>> >> >>> wrote:
> >> >>> >>> > >>> >> >>> > Hi,
> >> >>> >>> > >>> >> >>> > 1) Thank you for this.
> >> >>> >>> > >>> >> >>> > 2) Here are the images. I will look into the log
> >> files
> >> >>> of PI
> >> >>> >>> > >>> example
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>> > *Result of JPS command on slave*
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >>
> >> >>> >>> > >>>
> >> >>> >>> >
> >> >>> >>>
> >> >>>
> >>
> http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>> > *Result of JPS command on Master*
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >>
> >> >>> >>> > >>>
> >> >>> >>> >
> >> >>> >>>
> >> >>>
> >>
> http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>> > 3) In my current case, I do not have any input
> >> >>> submitted to
> >> >>> >>> > the
> >> >>> >>> > >>> job.
> >> >>> >>> > >>> >> >>> During
> >> >>> >>> > >>> >> >>> > run time, I directly fetch data from HDFS. So, I
> am
> >> >>> looking
> >> >>> >>> > for
> >> >>> >>> > >>> >> >>> something
> >> >>> >>> > >>> >> >>> > like BSPJob.set*Max*NumBspTask().
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>> > Regards,
> >> >>> >>> > >>> >> >>> > Behroz
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>> > On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon
> <
> >> >>> >>> > >>> >> edwardyoon@apache.org
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>> > wrote:
> >> >>> >>> > >>> >> >>> >
> >> >>> >>> > >>> >> >>> >> Hello,
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >> 1) You can get the filesystem URI from a
> >> configuration
> >> >>> >>> using
> >> >>> >>> > >>> >> >>> >> "FileSystem fs = FileSystem.get(conf);". Of
> >> course,
> >> >>> the
> >> >>> >>> > >>> fs.defaultFS
> >> >>> >>> > >>> >> >>> >> property should be in hama-site.xml
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >>   <property>
> >> >>> >>> > >>> >> >>> >>     <name>fs.defaultFS</name>
> >> >>> >>> > >>> >> >>> >>     <value>hdfs://host1.mydomain.com:9000/
> >> </value>
> >> >>> >>> > >>> >> >>> >>     <description>
> >> >>> >>> > >>> >> >>> >>       The name of the default file system.
> Either
> >> the
> >> >>> >>> literal
> >> >>> >>> > >>> string
> >> >>> >>> > >>> >> >>> >>       "local" or a host:port for HDFS.
> >> >>> >>> > >>> >> >>> >>     </description>
> >> >>> >>> > >>> >> >>> >>   </property>
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >> 2) The 'bsp.tasks.maximum' is the number of
> tasks
> >> per
> >> >>> node.
> >> >>> >>> > It
> >> >>> >>> > >>> looks
> >> >>> >>> > >>> >> >>> >> cluster configuration issue. Please run Pi
> example
> >> >>> and look
> >> >>> >>> > at
> >> >>> >>> > >>> the
> >> >>> >>> > >>> >> >>> >> logs for more details. NOTE: you can not attach
> >> the
> >> >>> images
> >> >>> >>> to
> >> >>> >>> > >>> >> mailing
> >> >>> >>> > >>> >> >>> >> list so I can't see it.
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >> 3) You can use the BSPJob.setNumBspTask(int)
> >> method.
> >> >>> If
> >> >>> >>> input
> >> >>> >>> > >>> is
> >> >>> >>> > >>> >> >>> >> provided, the number of BSP tasks is basically
> >> driven
> >> >>> by
> >> >>> >>> the
> >> >>> >>> > >>> number
> >> >>> >>> > >>> >> of
> >> >>> >>> > >>> >> >>> >> DFS blocks. I'll fix it to be more flexible on
> >> >>> HAMA-956.
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >> Thanks!
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >> On Tue, Jun 23, 2015 at 2:33 AM, Behroz
> Sikander <
> >> >>> >>> > >>> >> behroz89@gmail.com>
> >> >>> >>> > >>> >> >>> >> wrote:
> >> >>> >>> > >>> >> >>> >> > Hi,
> >> >>> >>> > >>> >> >>> >> > Recently, I moved from a single machine setup
> >> to a 2
> >> >>> >>> > machine
> >> >>> >>> > >>> >> setup.
> >> >>> >>> > >>> >> >>> I was
> >> >>> >>> > >>> >> >>> >> > successfully able to run my job that uses the
> >> HDFS
> >> >>> to get
> >> >>> >>> > >>> data. I
> >> >>> >>> > >>> >> >>> have 3
> >> >>> >>> > >>> >> >>> >> > trivial questions
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> > 1- To access HDFS, I have to manually give the
> >> IP
> >> >>> address
> >> >>> >>> > of
> >> >>> >>> > >>> >> server
> >> >>> >>> > >>> >> >>> >> running
> >> >>> >>> > >>> >> >>> >> > HDFS. I thought that Hama will automatically
> >> pick
> >> >>> from
> >> >>> >>> the
> >> >>> >>> > >>> >> >>> configurations
> >> >>> >>> > >>> >> >>> >> > but it does not. I am probably doing something
> >> >>> wrong.
> >> >>> >>> Right
> >> >>> >>> > >>> now my
> >> >>> >>> > >>> >> >>> code
> >> >>> >>> > >>> >> >>> >> work
> >> >>> >>> > >>> >> >>> >> > by using the following.
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> > FileSystem fs = FileSystem.get(new
> >> >>> >>> > >>> URI("hdfs://server_ip:port/"),
> >> >>> >>> > >>> >> >>> conf);
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> > 2- On my master server, when I start hama it
> >> >>> >>> automatically
> >> >>> >>> > >>> starts
> >> >>> >>> > >>> >> >>> hama in
> >> >>> >>> > >>> >> >>> >> > the slave machine (all good). Both master and
> >> slave
> >> >>> are
> >> >>> >>> set
> >> >>> >>> > >>> as
> >> >>> >>> > >>> >> >>> >> groomservers.
> >> >>> >>> > >>> >> >>> >> > This means that I have 2 servers to run my job
> >> which
> >> >>> >>> means
> >> >>> >>> > >>> that I
> >> >>> >>> > >>> >> can
> >> >>> >>> > >>> >> >>> >> open
> >> >>> >>> > >>> >> >>> >> > more BSPPeerChild processes. And if I submit
> my
> >> jar
> >> >>> with
> >> >>> >>> 3
> >> >>> >>> > >>> bsp
> >> >>> >>> > >>> >> tasks
> >> >>> >>> > >>> >> >>> then
> >> >>> >>> > >>> >> >>> >> > everything works fine. But when I move to 4
> >> tasks,
> >> >>> Hama
> >> >>> >>> > >>> freezes.
> >> >>> >>> > >>> >> >>> Here is
> >> >>> >>> > >>> >> >>> >> the
> >> >>> >>> > >>> >> >>> >> > result of JPS command on slave.
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> > Result of JPS command on Master
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> > You can see that it is only opening tasks on
> >> slaves
> >> >>> but
> >> >>> >>> not
> >> >>> >>> > >>> on
> >> >>> >>> > >>> >> >>> master.
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> > Note: I tried to change the bsp.tasks.maximum
> >> >>> property in
> >> >>> >>> > >>> >> >>> >> hama-default.xml
> >> >>> >>> > >>> >> >>> >> > to 4 but still same result.
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> > 3- I want my cluster to open as many
> >> BSPPeerChild
> >> >>> >>> processes
> >> >>> >>> > >>> as
> >> >>> >>> > >>> >> >>> possible.
> >> >>> >>> > >>> >> >>> >> Is
> >> >>> >>> > >>> >> >>> >> > there any setting that can I do to achieve
> that
> >> ?
> >> >>> Or hama
> >> >>> >>> > >>> picks up
> >> >>> >>> > >>> >> >>> the
> >> >>> >>> > >>> >> >>> >> > values from hama-default.xml to open tasks ?
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> > Regards,
> >> >>> >>> > >>> >> >>> >> >
> >> >>> >>> > >>> >> >>> >> > Behroz Sikander
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>> >> --
> >> >>> >>> > >>> >> >>> >> Best Regards, Edward J. Yoon
> >> >>> >>> > >>> >> >>> >>
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >> >>> --
> >> >>> >>> > >>> >> >>> Best Regards, Edward J. Yoon
> >> >>> >>> > >>> >> >>>
> >> >>> >>> > >>> >> >>
> >> >>> >>> > >>> >> >>
> >> >>> >>> > >>> >>
> >> >>> >>> > >>> >>
> >> >>> >>> > >>> >>
> >> >>> >>> > >>> >> --
> >> >>> >>> > >>> >> Best Regards, Edward J. Yoon
> >> >>> >>> > >>> >>
> >> >>> >>> > >>>
> >> >>> >>> > >>>
> >> >>> >>> > >>>
> >> >>> >>> > >>> --
> >> >>> >>> > >>> Best Regards, Edward J. Yoon
> >> >>> >>> > >>>
> >> >>> >>> > >>
> >> >>> >>> > >>
> >> >>> >>> > >
> >> >>> >>> >
> >> >>> >>> >
> >> >>> >>> >
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> > --
> >> >>> > Best Regards, Edward J. Yoon
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Best Regards, Edward J. Yoon
> >> >>>
> >> >>
> >> >>
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >>
> >
> >
>
>
>




Mime
View raw message