hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Francisco Sanches <sanchesban...@gmail.com>
Subject Re: Error when executing BSPPeer process
Date Sun, 03 Feb 2013 10:53:10 GMT
Hi Suraj,

I made some changes in my application, now it packs in a Text information
vertexid and distance from the end of the superstep it sends to its
adjacent Text. But I think the problem is as you observed Suraj, my
application uses too much memory even. I have observed that the ram on all
nodes of the cluster has been fully utilized. The swap memory has also been
used. In a cluster with 14 computers with 4 gb of ram, the largest graph
that could run was a 15 thousand vertices. I need to be able to run one of
at least 35 000 vertices. Would if I change the memory to 8 gb, I can run a
bigger graph? I'll try to organize here to change this setting memory, take
a few days when I finish post the result here.

2013/2/2 Suraj Menon <surajsmenon@apache.org>

> Hi Francisco, By any chance, is your aggregation runner getting messages
> from all vertices, with each message being the vector from each vertex?
> That is 35000*65537*4(assuming integer) = roughly 9GB. Apologies if you are
> not, because assuming you have 15 tasks for the purpose, the memory usage
> for vertex is coming to be around 600 MB. I am looking into limiting the
> memory used during Vertex#compute. Apologies for the delay.
>
> On Wed, Jan 30, 2013 at 2:01 AM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
>
> > Have you tried to increase the number of tasks?
> >
> > On Sat, Jan 26, 2013 at 2:34 AM, Francisco Sanches
> > <sanchesbanhos@gmail.com> wrote:
> > > Hi
> > >
> > > With graphs with up to 15,000 vertices can run. I believe it is a
> > > quick look at memory saw this being used enough memory swap.
> > >
> > > 2013/1/24 Edward J. Yoon <edwardyoon@apache.org>:
> > >> Have you checked your algorithm with small input? I'm not sure that is
> > >> a memory issue.
> > >>
> > >> On Thu, Jan 24, 2013 at 7:53 PM, Francisco Sanches
> > >> <sanchesbanhos@gmail.com> wrote:
> > >>> Hi thanks for the reply,
> > >>>
> > >>> My input is a graph of 35 000 vertices, but my application uses a
> > >>> vector of 65537 to store the distance. Each node has a vector with
> the
> > >>> number of positions (All-pair).
> > >>>
> > >>> 2013/1/24 Edward J. Yoon <edwardyoon@apache.org>:
> > >>>> Hi,
> > >>>>
> > >>>> How large your input data is?
> > >>>>
> > >>>> On Tue, Jan 22, 2013 at 11:21 PM, Francisco Sanches
> > >>>> <sanchesbanhos@gmail.com> wrote:
> > >>>>> Hi,
> > >>>>>
> > >>>>> When I try to run a larger graph in the application that I
> developed
> > >>>>> here first get this message:
> > >>>>>
> > >>>>> hduser@node28:/usr/local/hama-trunk$ time bin/hama jar
> > aplic19jan.jar
> > >>>>> 65537 /hdfs/disco/dados/internet_a /hdfs/disco/outinternet_a
42 30
> > >>>>> 13/01/22 10:59:14 INFO bsp.FileInputFormat: Total input paths
to
> > process : 1
> > >>>>> 13/01/22 10:59:14 INFO bsp.FileInputFormat: Total input paths
to
> > process : 1
> > >>>>> 13/01/22 10:59:17 INFO bsp.BSPJobClient: Running job:
> > job_201301221058_0002
> > >>>>> 13/01/22 10:59:20 INFO bsp.BSPJobClient: Current supersteps
> number: 0
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient: Current supersteps
> number: 1
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient: The total number of
> > supersteps: 1
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient: Counters: 6
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:
> > >>>>> org.apache.hama.bsp.JobInProgress$JobCounter
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     SUPERSTEPS=1
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     LAUNCHED_TASKS=1
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:
> > >>>>> org.apache.hama.bsp.BSPPeerImpl$PeerCounter
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     SUPERSTEP_SUM=1
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     TIME_IN_SYNC_MS=70
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:     IO_BYTES_READ=3467587
> > >>>>> 13/01/22 10:59:32 INFO bsp.BSPJobClient:
> TASK_INPUT_RECORDS=42089
> > >>>>> 13/01/22 10:59:32 INFO bsp.FileInputFormat: Total input paths
to
> > process : 42
> > >>>>> 13/01/22 10:59:35 INFO bsp.BSPJobClient: Running job:
> > job_201301221058_0001
> > >>>>> 13/01/22 10:59:38 INFO bsp.BSPJobClient: Current supersteps
> number: 0
> > >>>>> 13/01/22 11:00:38 INFO bsp.BSPJobClient: Current supersteps
> number: 1
> > >>>>> attempt_201301221058_0001_000001_0: SLF4J: Class path contains
> > >>>>> multiple SLF4J bindings.
> > >>>>> attempt_201301221058_0001_000001_0: SLF4J: Found binding in
> > >>>>>
> >
> [jar:file:/usr/local/hama-trunk/lib/slf4j-log4j12-1.5.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > >>>>> attempt_201301221058_0001_000001_0: SLF4J: Found binding in
> > >>>>>
> >
> [file:/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work/org/slf4j/impl/StaticLoggerBinder.class]
> > >>>>> attempt_201301221058_0001_000001_0: SLF4J: See
> > >>>>> http://www.slf4j.org/codes.html#multiple_bindings for an
> > explanation.
> > >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:14 INFO
> > >>>>> sync.ZKSyncClient: Initializing ZK Sync Client
> > >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:14 INFO
> > >>>>> sync.ZooKeeperSyncClientImpl: Start connecting to Zookeeper!
At
> > >>>>> node30/10.16.2.30:61003
> > >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
> > ipc.Server:
> > >>>>> Starting SocketReader
> > >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
> > ipc.Server:
> > >>>>> IPC Server Responder: starting
> > >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
> > ipc.Server:
> > >>>>> IPC Server listener on 61003: starting
> > >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
> > >>>>> message.HadoopMessageManagerImpl:  BSPPeer address:node30
> port:61003
> > >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:16 INFO
> > ipc.Server:
> > >>>>> IPC Server handler 0 on 61003: starting
> > >>>>> attempt_201301221058_0001_000001_0: 13/01/22 11:00:34 INFO
> > >>>>> graph.GraphJobRunner: Loading finished at 1 steps.
> > >>>>> 13/01/22 11:04:46 INFO bsp.BSPJobClient: Job failed.
> > >>>>>
> > >>>>> real 5m34.523s
> > >>>>> user 0m4.832s
> > >>>>> sys 0m0.960s
> > >>>>>
> > >>>>> Analyzing logs the one that fails has this log:
> > >>>>>
> > >>>>> /************************************************************
> > >>>>> STARTUP_MSG: Starting GroomServer
> > >>>>> STARTUP_MSG:   host = node30/10.16.2.30
> > >>>>> STARTUP_MSG:   args = [node30]
> > >>>>> STARTUP_MSG:   version = 1.0.0
> > >>>>> STARTUP_MSG:   build =
> > >>>>>
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0-r
> > >>>>> 1214675; compiled by 'hortonfo' on Fri Dec 16 20:01:27 UTC
2011
> > >>>>> ************************************************************/
> > >>>>> 2013-01-22 10:58:51,118 INFO org.apache.hama.bsp.GroomServer:
groom
> > start
> > >>>>> 2013-01-22 10:58:51,420 INFO org.apache.hama.bsp.GroomServer:
> > >>>>> /tmp/hama-hduser/bsp/local
> > >>>>> 2013-01-22 10:58:55,417 INFO org.apache.hadoop.ipc.Server:
Starting
> > SocketReader
> > >>>>> 2013-01-22 10:58:55,452 INFO org.apache.hadoop.ipc.Server:
IPC
> Server
> > >>>>> listener on 50000: starting
> > >>>>> 2013-01-22 10:58:55,452 INFO org.apache.hadoop.ipc.Server:
IPC
> Server
> > >>>>> Responder: starting
> > >>>>> 2013-01-22 10:58:55,455 INFO org.apache.hama.bsp.GroomServer:
> Worker
> > >>>>> rpc server --> node30:50000
> > >>>>> 2013-01-22 10:58:55,455 INFO org.apache.hadoop.ipc.Server:
IPC
> Server
> > >>>>> handler 0 on 50000: starting
> > >>>>> 2013-01-22 10:58:55,560 INFO org.mortbay.log: Logging to
> > >>>>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> > >>>>> org.mortbay.log.Slf4jLog
> > >>>>> 2013-01-22 10:58:55,645 INFO org.apache.hama.bsp.GroomServer:
> > starting
> > >>>>> webserver: node30
> > >>>>> 2013-01-22 10:58:55,645 INFO org.apache.hama.http.HttpServer:
Port
> > >>>>> returned by webServer.getConnectors()[0].getLocalPort() before
> open()
> > >>>>> is -1. Opening the listener on 40015
> > >>>>> 2013-01-22 10:58:55,646 INFO org.apache.hama.http.HttpServer:
> > >>>>> listener.getLocalPort() returned 40015
> > >>>>> webServer.getConnectors()[0].getLocalPort() returned 40015
> > >>>>> 2013-01-22 10:58:55,646 INFO org.apache.hama.http.HttpServer:
Jetty
> > >>>>> bound to port 40015
> > >>>>> 2013-01-22 10:58:55,646 INFO org.mortbay.log: jetty-6.1.14
> > >>>>> 2013-01-22 10:58:56,252 INFO org.mortbay.log: Started
> > >>>>> SelectChannelConnector@node30:40015
> > >>>>> 2013-01-22 10:59:01,574 INFO org.apache.hadoop.ipc.Server:
Starting
> > SocketReader
> > >>>>> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server:
IPC
> Server
> > >>>>> Responder: starting
> > >>>>> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server:
IPC
> Server
> > >>>>> listener on 59102: starting
> > >>>>> 2013-01-22 10:59:01,576 INFO org.apache.hadoop.ipc.Server:
IPC
> Server
> > >>>>> handler 0 on 59102: starting
> > >>>>> 2013-01-22 10:59:01,580 INFO org.apache.hadoop.ipc.Server:
IPC
> Server
> > >>>>> handler 1 on 59102: starting
> > >>>>> 2013-01-22 10:59:01,577 INFO org.apache.hadoop.ipc.Server:
IPC
> Server
> > >>>>> handler 2 on 59102: starting
> > >>>>> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server:
IPC
> Server
> > >>>>> handler 4 on 59102: starting
> > >>>>> 2013-01-22 10:59:01,583 INFO org.apache.hadoop.ipc.Server:
IPC
> Server
> > >>>>> handler 3 on 59102: starting
> > >>>>> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server:
IPC
> Server
> > >>>>> handler 5 on 59102: starting
> > >>>>> 2013-01-22 10:59:01,584 INFO org.apache.hadoop.ipc.Server:
IPC
> Server
> > >>>>> handler 6 on 59102: starting
> > >>>>> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server:
IPC
> Server
> > >>>>> handler 7 on 59102: starting
> > >>>>> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server:
IPC
> Server
> > >>>>> handler 8 on 59102: starting
> > >>>>> 2013-01-22 10:59:01,585 INFO org.apache.hadoop.ipc.Server:
IPC
> Server
> > >>>>> handler 9 on 59102: starting
> > >>>>> 2013-01-22 10:59:06,899 INFO org.apache.hama.bsp.GroomServer:
> > >>>>> GroomServer up at: 127.0.0.1/127.0.0.1:59102
> > >>>>> 2013-01-22 10:59:06,900 INFO org.apache.hama.bsp.GroomServer:
> > Starting
> > >>>>> groom: node30:50000
> > >>>>> 2013-01-22 10:59:18,083 INFO org.apache.hama.bsp.GroomServer:
> Launch
> > 1 tasks.
> > >>>>> 2013-01-22 10:59:18,283 WARN
> org.apache.hadoop.util.NativeCodeLoader:
> > >>>>> Unable to load native-hadoop library for your platform... using
> > >>>>> builtin-java classes where applicable
> > >>>>> 2013-01-22 10:59:23,057 INFO org.apache.hama.bsp.GroomServer:
Task
> > >>>>> 'attempt_201301221058_0002_000000_0' has started.
> > >>>>> 2013-01-22 10:59:31,403 INFO org.apache.hama.bsp.GroomServer:
Task
> > >>>>> attempt_201301221058_0002_000000_0 is done.
> > >>>>> 2013-01-22 10:59:35,757 INFO org.apache.hama.bsp.GroomServer:
> Launch
> > 3 tasks.
> > >>>>> 2013-01-22 10:59:56,486 INFO org.apache.hama.bsp.GroomServer:
Task
> > >>>>> 'attempt_201301221058_0001_000000_0' has started.
> > >>>>> 2013-01-22 10:59:56,487 INFO org.apache.hama.bsp.GroomServer:
> Launch
> > 3 tasks.
> > >>>>> 2013-01-22 11:00:06,973 INFO org.apache.hama.bsp.GroomServer:
Task
> > >>>>> 'attempt_201301221058_0001_000002_0' has started.
> > >>>>> 2013-01-22 11:00:06,974 INFO org.apache.hama.bsp.GroomServer:
> Launch
> > 3 tasks.
> > >>>>> 2013-01-22 11:00:12,927 INFO org.apache.hama.bsp.GroomServer:
Task
> > >>>>> 'attempt_201301221058_0001_000001_0' has started.
> > >>>>> 2013-01-22 11:03:50,330 ERROR org.apache.hama.bsp.TaskRunner:
Error
> > >>>>> when executing BSPPeer process.
> > >>>>> java.io.IOException: BSP task process exit with nonzero status
of
> > 137.
> > >>>>> command = [/usr/lib/jvm/jdk1.7.0_11/jre/bin/java, -Xmx6144m,
> > >>>>> -classpath,
> >
> /usr/local/hama-trunk/bin/../conf:/usr/lib/jvm/java-7-oracle//lib/tools.jar:/usr/local/hama-trunk/bin/../core/target/classes:/usr/local/hama-trunk/bin/../hama-**.jar:/usr/local/hama-trunk/bin/../lib/ant-1.7.1.jar:/usr/local/hama-trunk/bin/../lib/ant-launcher-1.7.1.jar:/usr/local/hama-trunk/bin/../lib/avro-1.6.0.jar:/usr/local/hama-trunk/bin/../lib/avro-ipc-1.6.0.jar:/usr/local/hama-trunk/bin/../lib/commons-cli-1.2.jar:/usr/local/hama-trunk/bin/../lib/commons-configuration-1.7.jar:/usr/local/hama-trunk/bin/../lib/commons-httpclient-3.0.1.jar:/usr/local/hama-trunk/bin/../lib/commons-lang-2.6.jar:/usr/local/hama-trunk/bin/../lib/commons-logging-1.1.1.jar:/usr/local/hama-trunk/bin/../lib/commons-math3-3.0.jar:/usr/local/hama-trunk/bin/../lib/guava-10.0.1.jar:/usr/local/hama-trunk/bin/../lib/hadoop-core-1.0.0.jar:/usr/local/hama-trunk/bin/../lib/hadoop-test-1.0.0.jar:/usr/local/hama-trunk/bin/../lib/jackson-core-asl-1.9.2.jar:/usr/local/hama-trunk/bin/../lib/jackson-mapper-asl-1.9.2.jar:/usr/local/hama-trunk/bin/../lib/jetty-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jetty-annotations-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jetty-util-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jsp-2.1-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/jsp-api-2.1-6.1.14.jar:/usr/local/hama-trunk/bin/../lib/junit-4.8.1.jar:/usr/local/hama-trunk/bin/../lib/log4j-1.2.16.jar:/usr/local/hama-trunk/bin/../lib/netty-3.2.6.Final.jar:/usr/local/hama-trunk/bin/../lib/servlet-api-6.0.32.jar:/usr/local/hama-trunk/bin/../lib/slf4j-api-1.5.8.jar:/usr/local/hama-trunk/bin/../lib/slf4j-log4j12-1.5.8.jar:/usr/local/hama-trunk/bin/../lib/snappy-java-1.0.4.1.jar:/usr/local/hama-trunk/bin/../lib/zookeeper-3.4.5.jar::/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work/classes:/tmp/hama-hduser/bsp/local/groomServer/attempt_201301221058_0001_000001_0/work,
> > >>>>> org.apache.hama.bsp.GroomServer$BSPPeerChild, 127.0.0.1, 59102,
> > >>>>> attempt_201301221058_0001_000001_0, node30, -1, RUNNING]
> > >>>>>         at
> > org.apache.hama.bsp.TaskRunner$BspChildRunner.call(TaskRunner.java:126)
> > >>>>>         at
> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > >>>>>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > >>>>>         at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> > >>>>>         at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
> > >>>>>         at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > >>>>>         at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > >>>>>         at java.lang.Thread.run(Thread.java:722)
> > >>>>> 2013-01-22 11:04:39,789 INFO org.apache.hama.bsp.GroomServer:
> adding
> > >>>>> purge task: attempt_201301221058_0001_000001_0
> > >>>>> 2013-01-22 11:04:39,884 INFO org.apache.hama.bsp.GroomServer:
About
> > to
> > >>>>> purge task: attempt_201301221058_0001_000001_0
> > >>>>> 2013-01-22 11:04:41,170 INFO org.apache.hama.bsp.GroomServer:
Kill
> 1
> > tasks.
> > >>>>> 2013-01-22 11:04:41,219 INFO org.apache.hama.bsp.GroomServer:
Kill
> 1
> > tasks.
> > >>>>> 2013-01-22 11:04:41,589 INFO org.apache.hama.bsp.GroomServer:
Kill
> 1
> > tasks.
> > >>>>>
> > >>>>>
> > >>>>> What can it be? Some configuration? To not get the message
of Java
> > >>>>> Heap Space configured:
> > >>>>>
> > >>>>>   <name> bsp.child.java.opts </ name>
> > >>>>>      <value>-Xmx6144m </ value>
> > >>>>>
> > >>>>> I have only 4 GB of memory in each Groomserver, can it be that?
I
> am
> > >>>>> using the trunk version of Hama.
> > >>>>>
> > >>>>> --
> > >>>>> Francisco Sanches
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Best Regards, Edward J. Yoon
> > >>>> @eddieyoon
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Francisco Sanches
> > >>
> > >>
> > >>
> > >> --
> > >> Best Regards, Edward J. Yoon
> > >> @eddieyoon
> > >
> > >
> > >
> > > --
> > > Francisco Sanches
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
> >
>



-- 
Francisco Sanches

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message