incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@googlemail.com>
Subject Re: Awesome bench results after removing Thread.sleep in sync() method.
Date Thu, 22 Sep 2011 08:43:13 GMT
I think when just changing the log level, log4j will take care of the
if(isEnabled) stuff, so we don't need to fragment our code.
Yes the current rev in trunk contains this snippet. I give you the rest of
the exception:

org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
> NoNode for /bsp/job_201109220959_0001/224/ready
>          at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>          at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>          at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728)
>          at org.apache.hama.bsp.BSPPeer$1.process(BSPPeer.java:396)
>          at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488)
>

Here is the part of the log of our zookeeper deamon:

> 2011-09-22 09:59:59,435 INFO
> org.apache.zookeeper.server.PrepRequestProcessor: Got user-level
> KeeperException when processing sessionid:0x1329025208e0003 type:delete
> cxid:0xc01 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error
> Path:/bsp/job_201109220959_0001/222/ready Error:KeeperErrorCode = NoNode for
> /bsp/job_201109220959_0001/222/ready
> 2011-09-22 09:59:59,499 INFO
> org.apache.zookeeper.server.PrepRequestProcessor: Got user-level
> KeeperException when processing sessionid:0x1329025208e0003 type:create
> cxid:0xc0e zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error
> Path:/bsp/job_201109220959_0001/223/ready Error:KeeperErrorCode = NodeExists
> for /bsp/job_201109220959_0001/223/ready
> 2011-09-22 09:59:59,627 INFO
> org.apache.zookeeper.server.PrepRequestProcessor: Got user-level
> KeeperException when processing sessionid:0x1329025208e0004 type:delete
> cxid:0xc22 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error
> Path:/bsp/job_201109220959_0001/224/ready Error:KeeperErrorCode = NoNode for
> /bsp/job_201109220959_0001/224/ready
>

2011/9/22 ChiaHung Lin <chl501@nuk.edu.tw>

> We might need to change log method by adding
>
> if(LOG.isInfoEnabled()){
>  ...
> }
>
> at least it can prevent string concatenation for performance optimization.
> (debug can be changed to if(LOG.isDebugEnabled()){} for performance
> optimization, too.)
>
> In addition, can you help check if enterBarrier() contains the following
> code snippet?
>
>   ...
>   zk.exists(pathToSuperstepZnode+"/ready", new Watcher() {
>      @Override
>      public void process(WatchedEvent event) {
>          // check if /ready znode exists, then delete it.
>          ...
>          } catch(KeeperException.NoNodeException nne) {
>            LOG.warn("Ignore because znode may be deleted.", nne);
>          }...
>      }
>    });
>    zk.create(getNodeName(), null, Ids.OPEN_ACL_UNSAFE,
> CreateMode.EPHEMERAL);
>    ...
>
> It looks like bsp peer is trying to remove /ready znode which may have
> already been removed by other bsp peer. Or stack trace in log would be
> helpful.
>
>
> -----Original message-----
> From:Thomas Jungblut <thomas.jungblut@googlemail.com>
> To:hama-dev@incubator.apache.org
> Date:Thu, 22 Sep 2011 10:05:52 +0200
> Subject:Re: Awesome bench results after removing Thread.sleep in sync()
> method.
>
> You're going to laugh, but we spend 80% of the time, logging the messages.
> Let's change the log level to debug or remove the logging in the bench
> example.
>
> Sadly I still receive
>
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
> > NoNode for /bsp/job_201109220959_0001/224/ready
> >
>
> and it hangs forever. Current version is after you committed ChiaHung's
> patch.
> I'm in pseudo-distributed mode with 3 tasks.
>
> Are you going to bench this without the logging? That would be interesting
> though ;D
>
> 2011/9/22 Thomas Jungblut <thomas.jungblut@googlemail.com>
>
> > That is great. I think we can push this under 200s.
> > I attach a profiler and send you a list of hotspots.
> >
> > lg.
> >
> > 2011/9/22 Edward J. Yoon <edwardyoon@apache.org>
> >
> > By ChiaHung's HAMA-387.patch, hang problem is fixed.
> >>
> >> And also, on same environment (1 rack, 256 cores), a bench example
> >> result is dramatically improved. (184.076 seconds from 307.129
> >> seconds)
> >>
> >> ----
> >> # core/bin/hama jar
> >> examples/target/hama-examples-0.4.0-incubating-SNAPSHOT.jar bench 16
> >> 1000 512
> >> ..
> >> 11/09/22 10:27:32 INFO bsp.BSPJobClient: Current supersteps number: 504
> >> 11/09/22 10:27:35 INFO bsp.BSPJobClient: Current supersteps number: 508
> >> 11/09/22 10:27:38 INFO bsp.BSPJobClient: Current supersteps number: 512
> >> 11/09/22 10:27:38 INFO bsp.BSPJobClient: The total number of supersteps:
> >> 512
> >> Job Finished in 184.076 seconds
> >>
> >> Hama 0.4 (r.1163903) was:
> >>
> >> 16 bytes | 1000 | 512 | 307.129 seconds
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >>
> >
> >
> >
> > --
> > Thomas Jungblut
> > Berlin
> >
> > mobile: 0170-3081070
> >
> > business: thomas.jungblut@testberichte.de
> > private: thomas.jungblut@gmail.com
> >
>
>
>
> --
> Thomas Jungblut
> Berlin
>
> mobile: 0170-3081070
>
> business: thomas.jungblut@testberichte.de
> private: thomas.jungblut@gmail.com
>
>
> --
> ChiaHung Lin
> Department of Information Management
> National University of Kaohsiung
> Taiwan
>



-- 
Thomas Jungblut
Berlin

mobile: 0170-3081070

business: thomas.jungblut@testberichte.de
private: thomas.jungblut@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message