giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Kunz <christ...@jybe-inc.com>
Subject Re: Giraph will fail while using more workers
Date Mon, 10 Oct 2011 18:17:45 GMT
Did you try something like
-Dmapred.child.java.opts="-Xss64k?
(see GIRAPH-12)

Christian

On Oct 10, 2011, at 11:08 AM, Zhiwei Gu wrote:

> Hi all,
>   In my giraph job, when I set the worker to be 200, it is ok, and while set to 500,
it will fail due to early stage OOM exception in one (or more) workers. As this worker fails,
other workers who wants to talk with this worker will keep on waiting until tried 5 times,
then that worker will fail.
> 
> Have you ever faced such issue?
> 
> Best,
> -z
> 
> 
> Here is the exception,
> 2011-10-08 09:26:59,108 INFO org.apache.giraph.comm.RPCCommunications: getRPCServer:
Added jobToken Ident: 17 6a 6f 62 5f 32 30 31 31 30 38 32 36 30 39 31 31 5f 36 36 37 30 39
30, Pass: 12 26 1a f1 d2 51 e1 bf 2d 36 63 11 26 18 17 3d 53 b3 15 f6, Kind: mapreduce.job,
Service: job_201108260911_667090
> 2011-10-08 09:26:59,116 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
> 2011-10-08 09:26:59,116 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
> 2011-10-08 09:26:59,117 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
> 2011-10-08 09:26:59,117 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
> 2011-10-08 09:26:59,117 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
> 2011-10-08 09:26:59,120 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean
for source RpcDetailedActivityForPort31250 registered.
> 2011-10-08 09:26:59,121 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean
for source RpcActivityForPort31250 registered.
> 2011-10-08 09:26:59,123 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
> 2011-10-08 09:26:59,123 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 31250:
starting
> 2011-10-08 09:26:59,127 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 31250:
starting
> 2011-10-08 09:26:59,127 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 31250:
starting
> 2011-10-08 09:26:59,133 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 31250:
starting
> 2011-10-08 09:26:59,133 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 31250:
starting
> 2011-10-08 09:26:59,137 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 31250:
starting
> 2011-10-08 09:26:59,144 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 31250:
starting
> 2011-10-08 09:26:59,144 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 31250:
starting
> 2011-10-08 09:26:59,144 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 31250:
starting
> 2011-10-08 09:26:59,144 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 31250:
starting
> 2011-10-08 09:26:59,144 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 31250:
starting
> 2011-10-08 09:26:59,145 INFO org.apache.hadoop.ipc.Server: IPC Server handler 10 on 31250:
starting
> 2011-10-08 09:26:59,145 INFO org.apache.hadoop.ipc.Server: IPC Server handler 11 on 31250:
starting
> 2011-10-08 09:26:59,145 INFO org.apache.hadoop.ipc.Server: IPC Server handler 12 on 31250:
starting
> 2011-10-08 09:26:59,145 INFO org.apache.hadoop.ipc.Server: IPC Server handler 13 on 31250:
starting
> 2011-10-08 09:26:59,145 INFO org.apache.hadoop.ipc.Server: IPC Server handler 14 on 31250:
starting
> 2011-10-08 09:26:59,145 INFO org.apache.hadoop.ipc.Server: IPC Server handler 15 on 31250:
starting
> 2011-10-08 09:26:59,146 INFO org.apache.hadoop.ipc.Server: IPC Server handler 16 on 31250:
starting
> 2011-10-08 09:26:59,146 INFO org.apache.hadoop.ipc.Server: IPC Server handler 17 on 31250:
starting
> 2011-10-08 09:26:59,146 INFO org.apache.hadoop.ipc.Server: IPC Server handler 18 on 31250:
starting
> 2011-10-08 09:26:59,146 INFO org.apache.hadoop.ipc.Server: IPC Server handler 19 on 31250:
starting
> 2011-10-08 09:26:59,146 INFO org.apache.hadoop.ipc.Server: IPC Server handler 20 on 31250:
starting
> 2011-10-08 09:26:59,146 INFO org.apache.hadoop.ipc.Server: IPC Server handler 21 on 31250:
starting
> 2011-10-08 09:26:59,147 INFO org.apache.hadoop.ipc.Server: IPC Server handler 22 on 31250:
starting
> 2011-10-08 09:26:59,147 INFO org.apache.hadoop.ipc.Server: IPC Server handler 23 on 31250:
starting
> 2011-10-08 09:26:59,147 INFO org.apache.hadoop.ipc.Server: IPC Server handler 24 on 31250:
starting
> 2011-10-08 09:26:59,147 INFO org.apache.hadoop.ipc.Server: IPC Server handler 25 on 31250:
starting
> 2011-10-08 09:26:59,147 INFO org.apache.hadoop.ipc.Server: IPC Server handler 26 on 31250:
starting
> 2011-10-08 09:26:59,147 INFO org.apache.hadoop.ipc.Server: IPC Server handler 27 on 31250:
starting
> 2011-10-08 09:26:59,148 INFO org.apache.hadoop.ipc.Server: IPC Server handler 28 on 31250:
starting
> 2011-10-08 09:26:59,148 INFO org.apache.hadoop.ipc.Server: IPC Server handler 29 on 31250:
starting
> 2011-10-08 09:26:59,148 INFO org.apache.hadoop.ipc.Server: IPC Server handler 30 on 31250:
starting
> 2011-10-08 09:26:59,148 INFO org.apache.hadoop.ipc.Server: IPC Server handler 31 on 31250:
starting
> 2011-10-08 09:26:59,148 INFO org.apache.hadoop.ipc.Server: IPC Server handler 32 on 31250:
starting
> 2011-10-08 09:26:59,148 INFO org.apache.hadoop.ipc.Server: IPC Server handler 33 on 31250:
starting
> 2011-10-08 09:26:59,149 INFO org.apache.hadoop.ipc.Server: IPC Server handler 34 on 31250:
starting
> 2011-10-08 09:26:59,149 INFO org.apache.hadoop.ipc.Server: IPC Server handler 35 on 31250:
starting
> 2011-10-08 09:26:59,149 INFO org.apache.hadoop.ipc.Server: IPC Server handler 36 on 31250:
starting
> 2011-10-08 09:26:59,149 INFO org.apache.hadoop.ipc.Server: IPC Server handler 37 on 31250:
starting
> 2011-10-08 09:26:59,149 INFO org.apache.hadoop.ipc.Server: IPC Server handler 38 on 31250:
starting
> 2011-10-08 09:26:59,149 INFO org.apache.hadoop.ipc.Server: IPC Server handler 39 on 31250:
starting
> 2011-10-08 09:26:59,150 INFO org.apache.hadoop.ipc.Server: IPC Server handler 40 on 31250:
starting
> 2011-10-08 09:26:59,150 INFO org.apache.hadoop.ipc.Server: IPC Server handler 41 on 31250:
starting
> 2011-10-08 09:26:59,150 INFO org.apache.hadoop.ipc.Server: IPC Server handler 42 on 31250:
starting
> 2011-10-08 09:26:59,150 INFO org.apache.hadoop.ipc.Server: IPC Server handler 43 on 31250:
starting
> 2011-10-08 09:26:59,150 INFO org.apache.hadoop.ipc.Server: IPC Server handler 44 on 31250:
starting
> 2011-10-08 09:26:59,150 INFO org.apache.hadoop.ipc.Server: IPC Server handler 45 on 31250:
starting
> 2011-10-08 09:26:59,151 INFO org.apache.hadoop.ipc.Server: IPC Server handler 46 on 31250:
starting
> 2011-10-08 09:26:59,151 INFO org.apache.hadoop.ipc.Server: IPC Server handler 47 on 31250:
starting
> 2011-10-08 09:26:59,151 INFO org.apache.hadoop.ipc.Server: IPC Server handler 48 on 31250:
starting
> 2011-10-08 09:26:59,151 INFO org.apache.hadoop.ipc.Server: IPC Server handler 49 on 31250:
starting
> 2011-10-08 09:26:59,151 INFO org.apache.hadoop.ipc.Server: IPC Server handler 50 on 31250:
starting
> 2011-10-08 09:26:59,152 INFO org.apache.hadoop.ipc.Server: IPC Server handler 51 on 31250:
starting
> 2011-10-08 09:26:59,152 INFO org.apache.hadoop.ipc.Server: IPC Server handler 52 on 31250:
starting
> 2011-10-08 09:26:59,152 INFO org.apache.hadoop.ipc.Server: IPC Server handler 53 on 31250:
starting
> 2011-10-08 09:26:59,152 INFO org.apache.hadoop.ipc.Server: IPC Server handler 54 on 31250:
starting
> 2011-10-08 09:26:59,153 INFO org.apache.hadoop.ipc.Server: IPC Server handler 55 on 31250:
starting
> 2011-10-08 09:26:59,153 INFO org.apache.hadoop.ipc.Server: IPC Server handler 56 on 31250:
starting
> 2011-10-08 09:26:59,153 INFO org.apache.hadoop.ipc.Server: IPC Server handler 57 on 31250:
starting
> 2011-10-08 09:26:59,153 INFO org.apache.hadoop.ipc.Server: IPC Server handler 58 on 31250:
starting
> 2011-10-08 09:26:59,153 INFO org.apache.hadoop.ipc.Server: IPC Server handler 59 on 31250:
starting
> 2011-10-08 09:26:59,154 INFO org.apache.hadoop.ipc.Server: IPC Server handler 60 on 31250:
starting
> 2011-10-08 09:26:59,154 INFO org.apache.hadoop.ipc.Server: IPC Server handler 61 on 31250:
starting
> 2011-10-08 09:26:59,154 INFO org.apache.hadoop.ipc.Server: IPC Server handler 62 on 31250:
starting
> 2011-10-08 09:26:59,154 INFO org.apache.hadoop.ipc.Server: IPC Server handler 63 on 31250:
starting
> 2011-10-08 09:26:59,155 INFO org.apache.hadoop.ipc.Server: IPC Server handler 64 on 31250:
starting
> 2011-10-08 09:26:59,155 INFO org.apache.hadoop.ipc.Server: IPC Server handler 65 on 31250:
starting
> 2011-10-08 09:26:59,155 INFO org.apache.hadoop.ipc.Server: IPC Server handler 66 on 31250:
starting
> 2011-10-08 09:26:59,155 INFO org.apache.hadoop.ipc.Server: IPC Server handler 67 on 31250:
starting
> 2011-10-08 09:26:59,155 INFO org.apache.hadoop.ipc.Server: IPC Server handler 68 on 31250:
starting
> 2011-10-08 09:26:59,155 INFO org.apache.hadoop.ipc.Server: IPC Server handler 69 on 31250:
starting
> 2011-10-08 09:26:59,156 INFO org.apache.hadoop.ipc.Server: IPC Server handler 70 on 31250:
starting
> 2011-10-08 09:26:59,156 INFO org.apache.hadoop.ipc.Server: IPC Server handler 71 on 31250:
starting
> 2011-10-08 09:26:59,156 INFO org.apache.hadoop.ipc.Server: IPC Server handler 72 on 31250:
starting
> 2011-10-08 09:26:59,156 INFO org.apache.hadoop.ipc.Server: IPC Server handler 73 on 31250:
starting
> 2011-10-08 09:26:59,156 INFO org.apache.hadoop.ipc.Server: IPC Server handler 74 on 31250:
starting
> 2011-10-08 09:26:59,156 INFO org.apache.hadoop.ipc.Server: IPC Server handler 75 on 31250:
starting
> 2011-10-08 09:26:59,157 INFO org.apache.hadoop.ipc.Server: IPC Server handler 76 on 31250:
starting
> 2011-10-08 09:26:59,157 INFO org.apache.hadoop.ipc.Server: IPC Server handler 77 on 31250:
starting
> 2011-10-08 09:26:59,157 INFO org.apache.hadoop.ipc.Server: IPC Server handler 78 on 31250:
starting
> 2011-10-08 09:26:59,157 INFO org.apache.hadoop.ipc.Server: IPC Server handler 79 on 31250:
starting
> 2011-10-08 09:26:59,157 INFO org.apache.hadoop.ipc.Server: IPC Server handler 80 on 31250:
starting
> 2011-10-08 09:26:59,157 INFO org.apache.hadoop.ipc.Server: IPC Server handler 81 on 31250:
starting
> 2011-10-08 09:26:59,158 INFO org.apache.hadoop.ipc.Server: IPC Server handler 82 on 31250:
starting
> 2011-10-08 09:26:59,158 INFO org.apache.hadoop.ipc.Server: IPC Server handler 83 on 31250:
starting
> 2011-10-08 09:26:59,158 INFO org.apache.hadoop.ipc.Server: IPC Server handler 84 on 31250:
starting
> 2011-10-08 09:26:59,158 INFO org.apache.hadoop.ipc.Server: IPC Server handler 85 on 31250:
starting
> 2011-10-08 09:26:59,158 INFO org.apache.hadoop.ipc.Server: IPC Server handler 86 on 31250:
starting
> 2011-10-08 09:26:59,158 INFO org.apache.hadoop.ipc.Server: IPC Server handler 87 on 31250:
starting
> 2011-10-08 09:26:59,159 INFO org.apache.hadoop.ipc.Server: IPC Server handler 88 on 31250:
starting
> 2011-10-08 09:26:59,159 INFO org.apache.hadoop.ipc.Server: IPC Server handler 89 on 31250:
starting
> 2011-10-08 09:26:59,159 INFO org.apache.hadoop.ipc.Server: IPC Server handler 90 on 31250:
starting
> 2011-10-08 09:26:59,159 INFO org.apache.hadoop.ipc.Server: IPC Server handler 91 on 31250:
starting
> 2011-10-08 09:26:59,159 INFO org.apache.hadoop.ipc.Server: IPC Server handler 92 on 31250:
starting
> 2011-10-08 09:26:59,159 INFO org.apache.hadoop.ipc.Server: IPC Server handler 93 on 31250:
starting
> 2011-10-08 09:26:59,160 INFO org.apache.hadoop.ipc.Server: IPC Server handler 94 on 31250:
starting
> 2011-10-08 09:26:59,160 INFO org.apache.hadoop.ipc.Server: IPC Server handler 95 on 31250:
starting
> 2011-10-08 09:26:59,160 INFO org.apache.hadoop.ipc.Server: IPC Server handler 96 on 31250:
starting
> 2011-10-08 09:26:59,160 INFO org.apache.hadoop.ipc.Server: IPC Server handler 97 on 31250:
starting
> 2011-10-08 09:26:59,161 INFO org.apache.hadoop.ipc.Server: IPC Server handler 98 on 31250:
starting
> 2011-10-08 09:26:59,161 INFO org.apache.giraph.comm.BasicRPCCommunications: BasicRPCCommunications:
Started RPC communication server: gsta33033.tan.ygrid.yahoo.com/10.216.176.59:31250 with 100
handlers
> 2011-10-08 09:26:59,161 INFO org.apache.hadoop.ipc.Server: IPC Server handler 99 on 31250:
starting
> 2011-10-08 09:27:05,234 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing
logs' truncater with mapRetainSize=102400 and reduceRetainSize=102400
> 2011-10-08 09:27:05,236 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError:
unable to create new native thread
> 	at java.lang.Thread.start0(Native Method)
> 	at java.lang.Thread.start(Thread.java:597)
> 	at java.lang.UNIXProcess$1.run(UNIXProcess.java:141)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at java.lang.UNIXProcess.<init>(UNIXProcess.java:103)
> 	at java.lang.ProcessImpl.start(ProcessImpl.java:65)
> 	at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
> 	at org.apache.hadoop.util.Shell.runCommand(Shell.java:200)
> 	at org.apache.hadoop.util.Shell.run(Shell.java:182)
> 	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
> 	at org.apache.hadoop.util.Shell.execCommand(Shell.java:461)
> 	at org.apache.hadoop.util.Shell.execCommand(Shell.java:444)
> 	at org.apache.hadoop.fs.RawLocalFileSystem.execCommand(RawLocalFileSystem.java:540)
> 	at org.apache.hadoop.fs.RawLocalFileSystem.access$100(RawLocalFileSystem.java:37)
> 	at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:417)
> 	at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getOwner(RawLocalFileSystem.java:400)
> 	at org.apache.hadoop.mapred.TaskLog.obtainLogDirOwner(TaskLog.java:275)
> 	at org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:124)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:255)
> 
> 2011-10-08 09:27:05,272 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping
MapTask metrics system...
> 2011-10-08 09:27:05,272 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping
metrics source ugi(org.apache.hadoop.security.UgiInstrumentation)
> 2011-10-08 09:27:05,272 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping
metrics source jvm(org.apache.hadoop.metrics2.source.JvmMetricsSource)
> 2011-10-08 09:27:05,272 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping
metrics source RpcDetailedActivityForPort31250(org.apache.hadoop.ipc.metrics.RpcInstrumentation$Detailed)
> 2011-10-08 09:27:05,272 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping
metrics source RpcActivityForPort31250(org.apache.hadoop.ipc.metrics.RpcInstrumentation)
> 2011-10-08 09:27:05,272 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask
metrics system stopped.
> 
> -- 
> Best Regards
> Zhiwei Gu


Mime
View raw message