hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HDFS-88) Hung on hdfs: writeChunk, DFSClient.java:2126, DataStreamer socketWrite
Date Mon, 21 Jul 2014 21:42:43 GMT

     [ https://issues.apache.org/jira/browse/HDFS-88?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Allen Wittenauer resolved HDFS-88.
----------------------------------

    Resolution: Incomplete

I'm going to close this as stale. I suspect this issue has gone away with the two fixes referenced.

> Hung on hdfs: writeChunk, DFSClient.java:2126, DataStreamer socketWrite
> -----------------------------------------------------------------------
>
>                 Key: HDFS-88
>                 URL: https://issues.apache.org/jira/browse/HDFS-88
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: stack
>
> We've seen this hang rare enough but when it happens it locks up the application.  We've
seen it at least in 0.18.x and 0.19.x (we don't have much experience with 0.20.x hdfs yet).
> Here we're doing a sequencefile#append
> {code}
> "IPC Server handler 9 on 60020" daemon prio=10 tid=0x00007fef1c3f0400 nid=0x7470 waiting
for monitor entry [0x0000000042d18000..0x0000000042d189f0]
>    java.lang.Thread.State: BLOCKED (on object monitor)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:2486)
> 	- waiting to lock <0x00007fef38ecc138> (a java.util.LinkedList)
> 	- locked <0x00007fef38ecbdb8> (a org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
> 	at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:155)
> 	at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
> 	- locked <0x00007fef38ecbdb8> (a org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
> 	at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
> 	- locked <0x00007fef38ecbdb8> (a org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
> 	at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
> 	at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
> 	- locked <0x00007fef38ecbdb8> (a org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)
> 	at java.io.DataOutputStream.write(DataOutputStream.java:107)
> 	- locked <0x00007fef38e09fc0> (a org.apache.hadoop.fs.FSDataOutputStream)
> 	at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1016)
> 	- locked <0x00007fef38e09f30> (a org.apache.hadoop.io.SequenceFile$Writer)
> 	at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:980)
> 	- locked <0x00007fef38e09f30> (a org.apache.hadoop.io.SequenceFile$Writer)
> 	at org.apache.hadoop.hbase.regionserver.HLog.doWrite(HLog.java:461)
> 	at org.apache.hadoop.hbase.regionserver.HLog.append(HLog.java:421)
> 	- locked <0x00007fef29ad9588> (a java.lang.Integer)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.update(HRegion.java:1676)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1439)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1378)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1184)
> 	at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:616)
> 	at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:622)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
> {code}
> The DataStreamer that is supposed to servicing the above writeChunk is stuck here:
> {code}
> "DataStreamer for file /hbase/log_72.34.249.212_1225407466779_60020/hlog.dat.1227075571390
block blk_-7436808403424765554_553837" daemon prio=10 tid=0x0000000001c84c00 nid=0x7125 in
Object.wait() [0x00000000409b3000..0x00000000409b3d70]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	at java.lang.Object.wait(Object.java:502)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:709)
> 	- locked <0x00007fef39520bb8> (a org.apache.hadoop.ipc.Client$Call)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> 	at org.apache.hadoop.dfs.$Proxy4.getProtocolVersion(Unknown Source)
> 	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
> 	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:306)
> 	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:343)
> 	at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:288)
> 	at org.apache.hadoop.dfs.DFSClient.createClientDatanodeProtocolProxy(DFSClient.java:139)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2185)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
> 	- locked <0x00007fef38ecc138> (a java.util.LinkedList)
> {code}
> The writeChunk is trying to synchronize on dataQueue.
> DataQueue is held by DataStreamer#run which is down in processDatanodeError trying to
recover a problem with a block.
> Another example of the hang and some more detail can be found over in HBASE-667.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message