hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-18541) [C++] Segfaults from JNI
Date Thu, 31 Aug 2017 22:30:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149659#comment-16149659
] 

Ted Yu edited comment on HBASE-18541 at 8/31/17 10:29 PM:
----------------------------------------------------------

The crash sometimes happened in netty:
{code}
(gdb) bt
#0  0x00007fb9cc51d25e in  ()
#1  0x00007fb9cc0c3d80 in [interpreted: bc = 20] io.netty.channel.nio.NioEventLoop.wakeup(boolean)
() at io/netty/channel/nio/NioEventLoop.java:645
#2  0x00007fb9cc0c3ffd in [interpreted: bc = 75] io.netty.util.concurrent.SingleThreadEventExecutor.execute(java.lang.Runnable)
()
    at io/netty/util/concurrent/SingleThreadEventExecutor.java:681
#3  0x00007fb9cc0c4042 in [interpreted: bc = 2] io.netty.channel.AbstractChannelHandlerContext.safeExecute(io.netty.util.concurrent.EventExecutor,java.lang.Runnable,io.netty.channel.ChannelPromise,java.lang.Object)
() at io/netty/channel/AbstractChannelHandlerContext.java:989
#4  0x00007fb9cc0c3ffd in [interpreted: bc = 95] io.netty.channel.AbstractChannelHandlerContext.write(java.lang.Object,boolean,io.netty.channel.ChannelPromise)
()
    at io/netty/channel/AbstractChannelHandlerContext.java:813
#5  0x00007fb9cc0c3ffd in [interpreted: bc = 34] io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(java.lang.Object,io.netty.channel.ChannelPromise)
()
    at io/netty/channel/AbstractChannelHandlerContext.java:782
#6  0x00007fb9cc0c3d80 in [interpreted: bc = 6] io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(java.lang.Object)
()
    at io/netty/channel/AbstractChannelHandlerContext.java:817
#7  0x00007fb9cc0c3d80 in [interpreted: bc = 5] io.netty.channel.DefaultChannelPipeline.writeAndFlush(java.lang.Object)
()
    at io/netty/channel/DefaultChannelPipeline.java:1011
#8  0x00007fb9cc0c3d80 in [interpreted: bc = 5] io.netty.channel.AbstractChannel.writeAndFlush(java.lang.Object)
() at io/netty/channel/AbstractChannel.java:289
#9  0x00007fb9cc0c3e54 in [interpreted: bc = 345] org.apache.hadoop.hbase.ipc.AsyncRpcChannelImpl.writeRequest(org.apache.hadoop.hbase.ipc.AsyncCall)
()
    at org/apache/hadoop/hbase/ipc/AsyncRpcChannelImpl.java:455
#10 0x00007fb9cc0c3ffd in [interpreted: bc = 164] org.apache.hadoop.hbase.ipc.AsyncRpcChannelImpl.callMethod(com.google.protobuf.Descriptors$MethodDescriptor,com.google.protobuf.Message,org.apache.hadoop.hbase.CellScanner,com.google.protobuf.Message,org.apache.hadoop.hbase.ipc.MessageConverter,org.apache.hadoop.hbase.ipc.IOExceptionConverter,long,int)
() at org/apache/hadoop/hbase/ipc/AsyncRpcChannelImpl.java:350
#11 0x00007fb9cc0c3e54 in [interpreted: bc = 54] org.apache.hadoop.hbase.ipc.AsyncRpcClient.call(org.apache.hadoop.hbase.ipc.PayloadCarryingRpcController,com.google.protobuf.Descriptors$MethodDescriptor,com.google.protobuf.Message,com.google.protobuf.Message,org.apache.hadoop.hbase.security.User,java.net.InetSocketAddress,org.apache.hadoop.hbase.client.MetricsConnection$CallStats)
() at org/apache/hadoop/hbase/ipc/AsyncRpcClient.java:243
#12 0x00007fb9cc0c3d80 in [interpreted: bc = 37] org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(com.google.protobuf.Descriptors$MethodDescriptor,org.apache.hadoop.hbase.ipc.PayloadCarryingRpcController,com.google.protobuf.Message,com.google.protobuf.Message,org.apache.hadoop.hbase.security.User,java.net.InetSocketAddress)
() at org/apache/hadoop/hbase/ipc/AbstractRpcClient.java:233
#13 0x00007fb9cc0c3d80 in [interpreted: bc = 28] org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(com.google.protobuf.Descriptors$MethodDescriptor,com.google.protobuf.RpcController,com.google.protobuf.Message,com.google.protobuf.Message)
()
    at org/apache/hadoop/hbase/ipc/AbstractRpcClient.java:354
#14 0x00007fb9cc0c3e54 in [interpreted: bc = 24] org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(com.google.protobuf.RpcController,org.apache.hadoop.hbase.protobuf.generated.MasterProtos$IsMasterRunningRequest)
()
    at org/apache/hadoop/hbase/protobuf/generated/MasterProtos.java:64354
#15 0x00007fb9cc0c3e54 in [interpreted: bc = 8] org.apache.hadoop.hbase.client.ConnectionImplementation$MasterServiceState.isMasterRunning()
()
    at org/apache/hadoop/hbase/client/ConnectionImplementation.java:939
#16 0x00007fb9cc0c37d0 in [interpreted: bc = 10] org.apache.hadoop.hbase.client.ConnectionImplementation.isKeepAliveMasterConnectedAndRunning(org.apache.hadoop.hbase.client.ConnectionImplementation$MasterServiceState)
() at org/apache/hadoop/hbase/client/ConnectionImplementation.java:1699
#17 0x00007fb9cc0c37d0 in [interpreted: bc = 12] org.apache.hadoop.hbase.client.ConnectionImplementation.getKeepAliveMasterService()
()
    at org/apache/hadoop/hbase/client/ConnectionImplementation.java:1287
{code}


was (Author: yuzhihong@gmail.com):
After changing hadoop version to 2.7.4 , the loop of tests seems more stable.

[~enis]:
Mind giving it a try ?

> [C++] Segfaults from JNI
> ------------------------
>
>                 Key: HBASE-18541
>                 URL: https://issues.apache.org/jira/browse/HBASE-18541
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Enis Soztutar
>            Assignee: Ted Yu
>         Attachments: 18541.v1.txt
>
>
> retry-test and multi-retry-test fails flakily when run with 
> {code}
> buck test --all --no-results-cache
> {code}
> or when run in a loop:
> {code}
> for i in `seq 1 10`; do buck test --no-results-cache core:retry-test || break 1; done
> {code}
> The problem seems to be within the JNI internals and usually happens at the create table
method call. I was not able to inspect much, but the comments in our mini-cluster indicate
that we may need to use global references instead of local ones. I suspect the problem happens
when there is a GC run for the test since the failure happens usually after some time (but
almost always in create table method). 
> [~ted_yu] do you mind taking a look at this. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message