accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam J Shook (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-4787) Numerous leaked CLOSE_WAIT threads from TabletServer
Date Fri, 26 Jan 2018 17:29:00 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16341318#comment-16341318
] 

Adam J Shook commented on ACCUMULO-4787:
----------------------------------------

This thread from a {{jstack}} looks promising.  A quick scan of the code looks like the input
stream is open but never closed.

{code}
Thread 43276: (state = IN_JAVA)
 - org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(org.apache.hadoop.hdfs.BlockReader,
int, int) @bci=7, line=782 (Compiled frame; information may be imprecise)
 - org.apache.hadoop.hdfs.DFSInputStream.readBuffer(org.apache.hadoop.hdfs.DFSInputStream$ReaderStrategy,
int, int, java.util.Map) @bci=10, line=838 (Compiled frame)
 - org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(org.apache.hadoop.hdfs.DFSInputStream$ReaderStrategy,
int, int) @bci=172, line=898 (Compiled frame)
 - org.apache.hadoop.hdfs.DFSInputStream.read(byte[], int, int) @bci=37, line=942 (Compiled
frame)
 - org.apache.hadoop.hdfs.DFSInputStream.read() @bci=21, line=742 (Compiled frame)
 - java.io.DataInputStream.readByte() @bci=4, line=265 (Compiled frame)
 - org.apache.hadoop.io.WritableUtils.readVLong(java.io.DataInput) @bci=34, line=315 (Compiled
frame)
 - org.apache.accumulo.server.data.ServerMutation.readFields(java.io.DataInput) @bci=17, line=55
(Compiled frame)
 - org.apache.accumulo.tserver.logger.LogFileValue.readFields(java.io.DataInput) @bci=38,
line=45 (Compiled frame)
 - org.apache.accumulo.tserver.replication.AccumuloReplicaSystem.getWalEdits(org.apache.accumulo.core.replication.ReplicationTarget,
java.io.DataInputStream, org.apache.hadoop.fs.Path, org.apache.accumulo.server.replication.proto.Replication$Status,
long, java.util.Set) @bci=65, line=709 (Compiled frame)
 - org.apache.accumulo.tserver.replication.AccumuloReplicaSystem$WalClientExecReturn.execute(org.apache.accumulo.core.replication.thrift.ReplicationServicer$Client)
@bci=28, line=538 (Compiled frame)
 - org.apache.accumulo.tserver.replication.AccumuloReplicaSystem$WalClientExecReturn.execute(java.lang.Object)
@bci=5, line=513 (Compiled frame)
 - org.apache.accumulo.core.client.impl.ReplicationClient.executeServicerWithReturn(org.apache.accumulo.core.client.impl.ClientContext,
com.google.common.net.HostAndPort, org.apache.accumulo.core.client.impl.ClientExecReturn,
long) @bci=14, line=191 (Compiled frame)
 - org.apache.accumulo.tserver.replication.AccumuloReplicaSystem.replicateLogs(org.apache.accumulo.core.client.impl.ClientContext,
com.google.common.net.HostAndPort, org.apache.accumulo.core.replication.ReplicationTarget,
org.apache.hadoop.fs.Path, org.apache.accumulo.server.replication.proto.Replication$Status,
long, java.lang.String, org.apache.accumulo.core.security.thrift.TCredentials, org.apache.accumulo.server.replication.ReplicaSystemHelper,
org.apache.hadoop.security.UserGroupInformation, long) @bci=440, line=436 (Compiled frame)
 - org.apache.accumulo.tserver.replication.AccumuloReplicaSystem._replicate(org.apache.hadoop.fs.Path,
org.apache.accumulo.server.replication.proto.Replication$Status, org.apache.accumulo.core.replication.ReplicationTarget,
org.apache.accumulo.server.replication.ReplicaSystemHelper, org.apache.accumulo.core.conf.AccumuloConfiguration,
org.apache.accumulo.core.client.impl.ClientContext, org.apache.hadoop.security.UserGroupInformation)
@bci=295, line=297 (Compiled frame)
 - org.apache.accumulo.tserver.replication.AccumuloReplicaSystem.replicate(org.apache.hadoop.fs.Path,
org.apache.accumulo.server.replication.proto.Replication$Status, org.apache.accumulo.core.replication.ReplicationTarget,
org.apache.accumulo.server.replication.ReplicaSystemHelper) @bci=232, line=216 (Compiled frame)
 - org.apache.accumulo.tserver.replication.ReplicationProcessor.process(java.lang.String,
byte[]) @bci=312, line=134 (Compiled frame)
 - org.apache.accumulo.server.zookeeper.DistributedWorkQueue$1.run() @bci=28, line=107 (Compiled
frame)
 - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
@bci=95, line=1142 (Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 (Interpreted frame)
 - org.apache.accumulo.fate.util.LoggingRunnable.run() @bci=4, line=35 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
{code}

> Numerous leaked CLOSE_WAIT threads from TabletServer
> ----------------------------------------------------
>
>                 Key: ACCUMULO-4787
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4787
>             Project: Accumulo
>          Issue Type: Bug
>    Affects Versions: 1.8.1
>         Environment: * Ubuntu 14.04
> * HDFS 2.6.0 and 2.5.0 (in the middle of an upgrade cycle)
> * ZooKeeper 3.4.6
> * Accumulo 1.8.1
> * HotSpot 1.8.0_121
>            Reporter: Adam J Shook
>            Assignee: Adam J Shook
>            Priority: Minor
>
> I'm running into an issue across all environments where TabletServers are occupying a
large number of ports in a CLOSED_WAIT state writing to a DataNode at port 50010.  I'm seeing
numbers from around 12,000 to 20,000 ports.  In some instances, there were over 68k and
it was affecting other applications from getting a free port and they would fail to start
(which is how we found this in the first place).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message