hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HDFS-871) Balancer can hang in PendingBlockMove
Date Tue, 29 Jul 2014 21:49:40 GMT

     [ https://issues.apache.org/jira/browse/HDFS-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Allen Wittenauer resolved HDFS-871.
-----------------------------------

    Resolution: Fixed

likely stale.

> Balancer can hang in PendingBlockMove
> -------------------------------------
>
>                 Key: HDFS-871
>                 URL: https://issues.apache.org/jira/browse/HDFS-871
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer
>    Affects Versions: 0.20.1
>         Environment: Yahoo 0.20
>            Reporter: Andrew Ryan
>         Attachments: balancer-jstack.out
>
>
> We started the balancer, with default options (-threshold 10), and it ran fine for a
few hours, then hung. The process was still alive but no balancing was taking place.
> At the time of the hang, jstack showed there were three threads in RUNNABLE status. Subsequent
jstacks taken minutes and hours later showed the same three threads running in the same place,
so I don't think this was a case where requests were being restarted, it looks like hangs.
My best guess is, there's no timeout in the request to the namenode for these requests, and
there needs to be.
> I'll attach the full jstack output, but here's a sample thread, they are all stuck in
the same place.
> "pool-1-thread-972" prio=10 tid=0x00002aaafc23a800 nid=0x27a8 runnable [0x00002a
> ab0a9a2000]
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.read(SocketInputStream.java:129)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>         - locked <0x00002aaaebdbe158> (a java.io.BufferedInputStream)
>         at java.io.DataInputStream.readShort(DataInputStream.java:295)
>         at org.apache.hadoop.hdfs.server.balancer.Balancer$PendingBlockMove.receiveResponse(Balancer.java:371)
>         at org.apache.hadoop.hdfs.server.balancer.Balancer$PendingBlockMove.dispatch(Balancer.java:326)
>         at org.apache.hadoop.hdfs.server.balancer.Balancer$PendingBlockMove.access$1800(Balancer.java:232)
>         at org.apache.hadoop.hdfs.server.balancer.Balancer$PendingBlockMove$1.run(Balancer.java:393)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message