incubator-giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhiwei Gu <guzhi...@gmail.com>
Subject Re: Giraph will fail while using more workers
Date Mon, 10 Oct 2011 19:46:30 GMT
Thank you girapher, I'll try the latest version, and report the result
later.

2011/10/10 Avery Ching <aching@apache.org>

>  Hi Zhiwei,
>
> The issue (known) is basically from here:
>
> 2011-10-08 09:27:05,236 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError:
unable to create new native thread
> 	at java.lang.Thread.start0(Native Method)
> 	at java.lang.Thread.start(Thread.java:597)
> 	at java.lang.UNIXProcess$1.run(UNIXProcess.java:141)
> 	at java.security.AccessController.doPrivileged(Native Method)
>
> It has been addressed to in GIRAPH-12 (
> https://issues.apache.org/jira/browse/GIRAPH-12).
>
>
> <snip>
> Currently every worker will start up a thread to communicate with every
> other workers. Hadoop RPC is used for communication. For instance if there
> are 400 workers, each worker will create 400 threads. This ends up using a
> lot of memory on the stack per worker, even with the option
>
> -Dmapred.child.java.opts="-Xss64k".
> </snip>
>
>
> It would be good if you could try the latest Apache Giraph instead of the
> older one at Yahoo!, then you need to set GiraphJob.MSG_NUM_FLUSH_THREADS
> (giraph.msgNumFlushThreads) to a value that won't cause you to run out of
> stack space.
>
> Avery
>  On 10/10/11 11:08 AM, Zhiwei Gu wrote:
>
> Hi all,
>   In my giraph job, when I set the worker to be 200, it is ok, and while
> set to 500, it will fail due to early stage OOM exception in one (or more)
> workers. As this worker fails, other workers who wants to talk with this
> worker will keep on waiting until tried 5 times, then that worker will fail.
>
>  Have you ever faced such issue?
>
>  Best,
> -z
>
>
>  Here is the exception,
> 2011-10-08 09:26:59,108 INFO org.apache.giraph.comm.RPCCommunications:
> getRPCServer: Added jobToken Ident: 17 6a 6f 62 5f 32 30 31 31 30 38 32 36
> 30 39 31 31 5f 36 36 37 30 39 30, Pass: 12 26 1a f1 d2 51 e1 bf 2d 36 63 11
> 26 18 17 3d 53 b3 15 f6, Kind: mapreduce.job, Service:
> job_201108260911_667090
>
> 2011-10-08 09:26:59,116 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
> 2011-10-08 09:26:59,116 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
> 2011-10-08 09:26:59,117 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
> 2011-10-08 09:26:59,117 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
> 2011-10-08 09:26:59,117 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
> 2011-10-08 09:26:59,120 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean
for source RpcDetailedActivityForPort31250 registered.
> 2011-10-08 09:26:59,121 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean
for source RpcActivityForPort31250 registered.
> 2011-10-08 09:26:59,123 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
> 2011-10-08 09:26:59,123 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 31250:
starting
> 2011-10-08 09:26:59,127 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 31250:
starting
> 2011-10-08 09:26:59,127 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 31250:
starting
> 2011-10-08 09:26:59,133 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 31250:
starting
> 2011-10-08 09:26:59,133 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 31250:
starting
> 2011-10-08 09:26:59,137 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 31250:
starting
> 2011-10-08 09:26:59,144 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 31250:
starting
> 2011-10-08 09:26:59,144 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 31250:
starting
> 2011-10-08 09:26:59,144 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 31250:
starting
> 2011-10-08 09:26:59,144 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 31250:
starting
> 2011-10-08 09:26:59,144 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 31250:
starting
> 2011-10-08 09:26:59,145 INFO org.apache.hadoop.ipc.Server: IPC Server handler 10 on 31250:
starting
> 2011-10-08 09:26:59,145 INFO org.apache.hadoop.ipc.Server: IPC Server handler 11 on 31250:
starting
> 2011-10-08 09:26:59,145 INFO org.apache.hadoop.ipc.Server: IPC Server handler 12 on 31250:
starting
> 2011-10-08 09:26:59,145 INFO org.apache.hadoop.ipc.Server: IPC Server handler 13 on 31250:
starting
> 2011-10-08 09:26:59,145 INFO org.apache.hadoop.ipc.Server: IPC Server handler 14 on 31250:
starting
> 2011-10-08 09:26:59,145 INFO org.apache.hadoop.ipc.Server: IPC Server handler 15 on 31250:
starting
> 2011-10-08 09:26:59,146 INFO org.apache.hadoop.ipc.Server: IPC Server handler 16 on 31250:
starting
> 2011-10-08 09:26:59,146 INFO org.apache.hadoop.ipc.Server: IPC Server handler 17 on 31250:
starting
> 2011-10-08 09:26:59,146 INFO org.apache.hadoop.ipc.Server: IPC Server handler 18 on 31250:
starting
> 2011-10-08 09:26:59,146 INFO org.apache.hadoop.ipc.Server: IPC Server handler 19 on 31250:
starting
> 2011-10-08 09:26:59,146 INFO org.apache.hadoop.ipc.Server: IPC Server handler 20 on 31250:
starting
> 2011-10-08 09:26:59,146 INFO org.apache.hadoop.ipc.Server: IPC Server handler 21 on 31250:
starting
> 2011-10-08 09:26:59,147 INFO org.apache.hadoop.ipc.Server: IPC Server handler 22 on 31250:
starting
> 2011-10-08 09:26:59,147 INFO org.apache.hadoop.ipc.Server: IPC Server handler 23 on 31250:
starting
> 2011-10-08 09:26:59,147 INFO org.apache.hadoop.ipc.Server: IPC Server handler 24 on 31250:
starting
> 2011-10-08 09:26:59,147 INFO org.apache.hadoop.ipc.Server: IPC Server handler 25 on 31250:
starting
> 2011-10-08 09:26:59,147 INFO org.apache.hadoop.ipc.Server: IPC Server handler 26 on 31250:
starting
> 2011-10-08 09:26:59,147 INFO org.apache.hadoop.ipc.Server: IPC Server handler 27 on 31250:
starting
> 2011-10-08 09:26:59,148 INFO org.apache.hadoop.ipc.Server: IPC Server handler 28 on 31250:
starting
> 2011-10-08 09:26:59,148 INFO org.apache.hadoop.ipc.Server: IPC Server handler 29 on 31250:
starting
> 2011-10-08 09:26:59,148 INFO org.apache.hadoop.ipc.Server: IPC Server handler 30 on 31250:
starting
> 2011-10-08 09:26:59,148 INFO org.apache.hadoop.ipc.Server: IPC Server handler 31 on 31250:
starting
> 2011-10-08 09:26:59,148 INFO org.apache.hadoop.ipc.Server: IPC Server handler 32 on 31250:
starting
> 2011-10-08 09:26:59,148 INFO org.apache.hadoop.ipc.Server: IPC Server handler 33 on 31250:
starting
> 2011-10-08 09:26:59,149 INFO org.apache.hadoop.ipc.Server: IPC Server handler 34 on 31250:
starting
> 2011-10-08 09:26:59,149 INFO org.apache.hadoop.ipc.Server: IPC Server handler 35 on 31250:
starting
> 2011-10-08 09:26:59,149 INFO org.apache.hadoop.ipc.Server: IPC Server handler 36 on 31250:
starting
> 2011-10-08 09:26:59,149 INFO org.apache.hadoop.ipc.Server: IPC Server handler 37 on 31250:
starting
> 2011-10-08 09:26:59,149 INFO org.apache.hadoop.ipc.Server: IPC Server handler 38 on 31250:
starting
> 2011-10-08 09:26:59,149 INFO org.apache.hadoop.ipc.Server: IPC Server handler 39 on 31250:
starting
> 2011-10-08 09:26:59,150 INFO org.apache.hadoop.ipc.Server: IPC Server handler 40 on 31250:
starting
> 2011-10-08 09:26:59,150 INFO org.apache.hadoop.ipc.Server: IPC Server handler 41 on 31250:
starting
> 2011-10-08 09:26:59,150 INFO org.apache.hadoop.ipc.Server: IPC Server handler 42 on 31250:
starting
> 2011-10-08 09:26:59,150 INFO org.apache.hadoop.ipc.Server: IPC Server handler 43 on 31250:
starting
> 2011-10-08 09:26:59,150 INFO org.apache.hadoop.ipc.Server: IPC Server handler 44 on 31250:
starting
> 2011-10-08 09:26:59,150 INFO org.apache.hadoop.ipc.Server: IPC Server handler 45 on 31250:
starting
> 2011-10-08 09:26:59,151 INFO org.apache.hadoop.ipc.Server: IPC Server handler 46 on 31250:
starting
> 2011-10-08 09:26:59,151 INFO org.apache.hadoop.ipc.Server: IPC Server handler 47 on 31250:
starting
> 2011-10-08 09:26:59,151 INFO org.apache.hadoop.ipc.Server: IPC Server handler 48 on 31250:
starting
> 2011-10-08 09:26:59,151 INFO org.apache.hadoop.ipc.Server: IPC Server handler 49 on 31250:
starting
> 2011-10-08 09:26:59,151 INFO org.apache.hadoop.ipc.Server: IPC Server handler 50 on 31250:
starting
> 2011-10-08 09:26:59,152 INFO org.apache.hadoop.ipc.Server: IPC Server handler 51 on 31250:
starting
> 2011-10-08 09:26:59,152 INFO org.apache.hadoop.ipc.Server: IPC Server handler 52 on 31250:
starting
> 2011-10-08 09:26:59,152 INFO org.apache.hadoop.ipc.Server: IPC Server handler 53 on 31250:
starting
> 2011-10-08 09:26:59,152 INFO org.apache.hadoop.ipc.Server: IPC Server handler 54 on 31250:
starting
> 2011-10-08 09:26:59,153 INFO org.apache.hadoop.ipc.Server: IPC Server handler 55 on 31250:
starting
> 2011-10-08 09:26:59,153 INFO org.apache.hadoop.ipc.Server: IPC Server handler 56 on 31250:
starting
> 2011-10-08 09:26:59,153 INFO org.apache.hadoop.ipc.Server: IPC Server handler 57 on 31250:
starting
> 2011-10-08 09:26:59,153 INFO org.apache.hadoop.ipc.Server: IPC Server handler 58 on 31250:
starting
> 2011-10-08 09:26:59,153 INFO org.apache.hadoop.ipc.Server: IPC Server handler 59 on 31250:
starting
> 2011-10-08 09:26:59,154 INFO org.apache.hadoop.ipc.Server: IPC Server handler 60 on 31250:
starting
> 2011-10-08 09:26:59,154 INFO org.apache.hadoop.ipc.Server: IPC Server handler 61 on 31250:
starting
> 2011-10-08 09:26:59,154 INFO org.apache.hadoop.ipc.Server: IPC Server handler 62 on 31250:
starting
> 2011-10-08 09:26:59,154 INFO org.apache.hadoop.ipc.Server: IPC Server handler 63 on 31250:
starting
> 2011-10-08 09:26:59,155 INFO org.apache.hadoop.ipc.Server: IPC Server handler 64 on 31250:
starting
> 2011-10-08 09:26:59,155 INFO org.apache.hadoop.ipc.Server: IPC Server handler 65 on 31250:
starting
> 2011-10-08 09:26:59,155 INFO org.apache.hadoop.ipc.Server: IPC Server handler 66 on 31250:
starting
> 2011-10-08 09:26:59,155 INFO org.apache.hadoop.ipc.Server: IPC Server handler 67 on 31250:
starting
> 2011-10-08 09:26:59,155 INFO org.apache.hadoop.ipc.Server: IPC Server handler 68 on 31250:
starting
> 2011-10-08 09:26:59,155 INFO org.apache.hadoop.ipc.Server: IPC Server handler 69 on 31250:
starting
> 2011-10-08 09:26:59,156 INFO org.apache.hadoop.ipc.Server: IPC Server handler 70 on 31250:
starting
> 2011-10-08 09:26:59,156 INFO org.apache.hadoop.ipc.Server: IPC Server handler 71 on 31250:
starting
> 2011-10-08 09:26:59,156 INFO org.apache.hadoop.ipc.Server: IPC Server handler 72 on 31250:
starting
> 2011-10-08 09:26:59,156 INFO org.apache.hadoop.ipc.Server: IPC Server handler 73 on 31250:
starting
> 2011-10-08 09:26:59,156 INFO org.apache.hadoop.ipc.Server: IPC Server handler 74 on 31250:
starting
> 2011-10-08 09:26:59,156 INFO org.apache.hadoop.ipc.Server: IPC Server handler 75 on 31250:
starting
> 2011-10-08 09:26:59,157 INFO org.apache.hadoop.ipc.Server: IPC Server handler 76 on 31250:
starting
> 2011-10-08 09:26:59,157 INFO org.apache.hadoop.ipc.Server: IPC Server handler 77 on 31250:
starting
> 2011-10-08 09:26:59,157 INFO org.apache.hadoop.ipc.Server: IPC Server handler 78 on 31250:
starting
> 2011-10-08 09:26:59,157 INFO org.apache.hadoop.ipc.Server: IPC Server handler 79 on 31250:
starting
> 2011-10-08 09:26:59,157 INFO org.apache.hadoop.ipc.Server: IPC Server handler 80 on 31250:
starting
> 2011-10-08 09:26:59,157 INFO org.apache.hadoop.ipc.Server: IPC Server handler 81 on 31250:
starting
> 2011-10-08 09:26:59,158 INFO org.apache.hadoop.ipc.Server: IPC Server handler 82 on 31250:
starting
> 2011-10-08 09:26:59,158 INFO org.apache.hadoop.ipc.Server: IPC Server handler 83 on 31250:
starting
> 2011-10-08 09:26:59,158 INFO org.apache.hadoop.ipc.Server: IPC Server handler 84 on 31250:
starting
> 2011-10-08 09:26:59,158 INFO org.apache.hadoop.ipc.Server: IPC Server handler 85 on 31250:
starting
> 2011-10-08 09:26:59,158 INFO org.apache.hadoop.ipc.Server: IPC Server handler 86 on 31250:
starting
> 2011-10-08 09:26:59,158 INFO org.apache.hadoop.ipc.Server: IPC Server handler 87 on 31250:
starting
> 2011-10-08 09:26:59,159 INFO org.apache.hadoop.ipc.Server: IPC Server handler 88 on 31250:
starting
> 2011-10-08 09:26:59,159 INFO org.apache.hadoop.ipc.Server: IPC Server handler 89 on 31250:
starting
> 2011-10-08 09:26:59,159 INFO org.apache.hadoop.ipc.Server: IPC Server handler 90 on 31250:
starting
> 2011-10-08 09:26:59,159 INFO org.apache.hadoop.ipc.Server: IPC Server handler 91 on 31250:
starting
> 2011-10-08 09:26:59,159 INFO org.apache.hadoop.ipc.Server: IPC Server handler 92 on 31250:
starting
> 2011-10-08 09:26:59,159 INFO org.apache.hadoop.ipc.Server: IPC Server handler 93 on 31250:
starting
> 2011-10-08 09:26:59,160 INFO org.apache.hadoop.ipc.Server: IPC Server handler 94 on 31250:
starting
> 2011-10-08 09:26:59,160 INFO org.apache.hadoop.ipc.Server: IPC Server handler 95 on 31250:
starting
> 2011-10-08 09:26:59,160 INFO org.apache.hadoop.ipc.Server: IPC Server handler 96 on 31250:
starting
> 2011-10-08 09:26:59,160 INFO org.apache.hadoop.ipc.Server: IPC Server handler 97 on 31250:
starting
> 2011-10-08 09:26:59,161 INFO org.apache.hadoop.ipc.Server: IPC Server handler 98 on 31250:
starting
> 2011-10-08 09:26:59,161 INFO org.apache.giraph.comm.BasicRPCCommunications: BasicRPCCommunications:
Started RPC communication server: gsta33033.tan.ygrid.yahoo.com/10.216.176.59:31250 with 100
handlers
> 2011-10-08 09:26:59,161 INFO org.apache.hadoop.ipc.Server: IPC Server handler 99 on 31250:
starting
> 2011-10-08 09:27:05,234 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing
logs' truncater with mapRetainSize=102400 and reduceRetainSize=102400
> 2011-10-08 09:27:05,236 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError:
unable to create new native thread
> 	at java.lang.Thread.start0(Native Method)
> 	at java.lang.Thread.start(Thread.java:597)
> 	at java.lang.UNIXProcess$1.run(UNIXProcess.java:141)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at java.lang.UNIXProcess.<init>(UNIXProcess.java:103)
> 	at java.lang.ProcessImpl.start(ProcessImpl.java:65)
> 	at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
> 	at org.apache.hadoop.util.Shell.runCommand(Shell.java:200)
> 	at org.apache.hadoop.util.Shell.run(Shell.java:182)
> 	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
> 	at org.apache.hadoop.util.Shell.execCommand(Shell.java:461)
> 	at org.apache.hadoop.util.Shell.execCommand(Shell.java:444)
> 	at org.apache.hadoop.fs.RawLocalFileSystem.execCommand(RawLocalFileSystem.java:540)
> 	at org.apache.hadoop.fs.RawLocalFileSystem.access$100(RawLocalFileSystem.java:37)
> 	at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:417)
> 	at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getOwner(RawLocalFileSystem.java:400)
> 	at org.apache.hadoop.mapred.TaskLog.obtainLogDirOwner(TaskLog.java:275)
> 	at org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:124)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:255)
>
> 2011-10-08 09:27:05,272 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping
MapTask metrics system...
> 2011-10-08 09:27:05,272 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping
metrics source ugi(org.apache.hadoop.security.UgiInstrumentation)
> 2011-10-08 09:27:05,272 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping
metrics source jvm(org.apache.hadoop.metrics2.source.JvmMetricsSource)
> 2011-10-08 09:27:05,272 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping
metrics source RpcDetailedActivityForPort31250(org.apache.hadoop.ipc.metrics.RpcInstrumentation$Detailed)
> 2011-10-08 09:27:05,272 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping
metrics source RpcActivityForPort31250(org.apache.hadoop.ipc.metrics.RpcInstrumentation)
> 2011-10-08 09:27:05,272 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask
metrics system stopped.
>
>
>  --
> Best Regards
> Zhiwei Gu
>
>
>
>


-- 
Best Regards
Zhiwei Gu

Mime
View raw message