spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Graves (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-16914) NodeManager crash when spark are registering executor infomartion into leveldb
Date Mon, 22 Aug 2016 13:07:20 GMT

    [ https://issues.apache.org/jira/browse/SPARK-16914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430725#comment-15430725
] 

Thomas Graves commented on SPARK-16914:
---------------------------------------

that is considered a fatal issue for the nodemanager and its expected to fail.  hardware goes
bad sometimes so you either make sure these paths are durable or the nodemanager is going
to crash.  not sure what other options you have here



> NodeManager crash when spark are registering executor infomartion into leveldb
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-16914
>                 URL: https://issues.apache.org/jira/browse/SPARK-16914
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle
>    Affects Versions: 1.6.2
>            Reporter: cen yuhai
>
> {noformat}
> Stack: [0x00007fb5b53de000,0x00007fb5b54df000],  sp=0x00007fb5b54dcba8,  free space=1018k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
> C  [libc.so.6+0x896b1]  memcpy+0x11
> Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
> j  org.fusesource.leveldbjni.internal.NativeDB$DBJNI.Put(JLorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeSlice;Lorg/fusesource/leveldbjni/internal/NativeSlice;)J+0
> j  org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeSlice;Lorg/fusesource/leveldbjni/internal/NativeSlice;)V+11
> j  org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeBuffer;Lorg/fusesource/leveldbjni/internal/NativeBuffer;)V+18
> j  org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;[B[B)V+36
> j  org.fusesource.leveldbjni.internal.JniDB.put([B[BLorg/iq80/leveldb/WriteOptions;)Lorg/iq80/leveldb/Snapshot;+28
> j  org.fusesource.leveldbjni.internal.JniDB.put([B[B)V+10
> j  org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.registerExecutor(Ljava/lang/String;Ljava/lang/String;Lorg/apache/spark/network/shuffle/protocol/ExecutorShuffleInfo;)V+61
> J 8429 C2 org.apache.spark.network.server.TransportRequestHandler.handle(Lorg/apache/spark/network/protocol/RequestMessage;)V
(100 bytes) @ 0x00007fb5f27ff6cc [0x00007fb5f27fdde0+0x18ec]
> J 8371 C2 org.apache.spark.network.server.TransportChannelHandler.channelRead0(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V
(10 bytes) @ 0x00007fb5f242df20 [0x00007fb5f242de80+0xa0]
> J 6853 C2 io.netty.channel.SimpleChannelInboundHandler.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V
(74 bytes) @ 0x00007fb5f215587c [0x00007fb5f21557e0+0x9c]
> J 5872 C2 io.netty.handler.timeout.IdleStateHandler.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V
(42 bytes) @ 0x00007fb5f2183268 [0x00007fb5f2183100+0x168]
> J 5849 C2 io.netty.handler.codec.MessageToMessageDecoder.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V
(158 bytes) @ 0x00007fb5f2191524 [0x00007fb5f218f5a0+0x1f84]
> J 5941 C2 org.apache.spark.network.util.TransportFrameDecoder.channelRead(Lio/netty/channel/ChannelHandlerContext;Ljava/lang/Object;)V
(170 bytes) @ 0x00007fb5f220a230 [0x00007fb5f2209fc0+0x270]
> J 7747 C2 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read()V (363 bytes)
@ 0x00007fb5f264465c [0x00007fb5f2644140+0x51c]
> J 8008% C2 io.netty.channel.nio.NioEventLoop.run()V (162 bytes) @ 0x00007fb5f26f6764
[0x00007fb5f26f63c0+0x3a4]
> j  io.netty.util.concurrent.SingleThreadEventExecutor$2.run()V+13
> j  java.lang.Thread.run()V+11
> v  ~StubRoutines::call_stub
> {noformat}
> The target code in spark is in ExternalShuffleBlockResolver
> {code}
>   /** Registers a new Executor with all the configuration we need to find its shuffle
files. */
>   public void registerExecutor(
>       String appId,
>       String execId,
>       ExecutorShuffleInfo executorInfo) {
>     AppExecId fullId = new AppExecId(appId, execId);
>     logger.info("Registered executor {} with {}", fullId, executorInfo);
>     try {
>       if (db != null) {
>         byte[] key = dbAppExecKey(fullId);
>         byte[] value =  mapper.writeValueAsString(executorInfo).getBytes(Charsets.UTF_8);
>         db.put(key, value);
>       }
>     } catch (Exception e) {
>       logger.error("Error saving registered executors", e);
>     }
>     executors.put(fullId, executorInfo);
>   }
> {code}
> There is a problem with disk1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message