hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yiqun Lin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-11377) Balancer hung due to no available mover threads
Date Mon, 06 Feb 2017 05:26:41 GMT

     [ https://issues.apache.org/jira/browse/HDFS-11377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yiqun Lin updated HDFS-11377:
-----------------------------
    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

The remove operation should be safe since the method {{removePendingBlock}} has using {{synchronized}}.
The failed test is not related. Committed to trunk and branch-2. Thanks [~zhaoyunjiong] for
the contribution and thanks [~manojg] for the review!

> Balancer hung due to no available mover threads
> -----------------------------------------------
>
>                 Key: HDFS-11377
>                 URL: https://issues.apache.org/jira/browse/HDFS-11377
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer & mover
>    Affects Versions: 2.7.3
>            Reporter: yunjiong zhao
>            Assignee: yunjiong zhao
>             Fix For: 2.9.0, 3.0.0-alpha3
>
>         Attachments: HDFS-11377.001.patch, HDFS-11377.002.patch
>
>
> When running balancer on large cluster which have more than 3000 Datanodes, it might
be hung due to "No mover threads available".
> The stack trace shows it waiting forever like below.
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x00007ff6cc014800 nid=0x6b2c waiting on condition [0x00007ff6d1bad000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.hadoop.hdfs.server.balancer.Dispatcher.waitForMoveCompletion(Dispatcher.java:1043)
>         at org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchBlockMoves(Dispatcher.java:1017)
>         at org.apache.hadoop.hdfs.server.balancer.Dispatcher.dispatchAndCheckContinue(Dispatcher.java:981)
>         at org.apache.hadoop.hdfs.server.balancer.Balancer.runOneIteration(Balancer.java:611)
>         at org.apache.hadoop.hdfs.server.balancer.Balancer.run(Balancer.java:663)
>         at org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:776)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>         at org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:905)
> {code}
> In the log, there are lots of WARN about "No mover threads available".
> {quote}
> 2017-01-26 15:36:40,085 WARN org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover
threads available: skip moving blk_13700554102_1112815018180 with size=268435456 from 10.115.67.137:50010:DISK
to 10.140.21.55:50010:DISK through 10.115.67.137:50010
> 2017-01-26 15:36:40,085 WARN org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover
threads available: skip moving blk_4009558842_1103118359883 with size=268435456 from 10.115.67.137:50010:DISK
to 10.140.21.55:50010:DISK through 10.115.67.137:50010
> 2017-01-26 15:36:40,085 WARN org.apache.hadoop.hdfs.server.balancer.Dispatcher: No mover
threads available: skip moving blk_13881956058_1112996460026 with size=133509566 from 10.115.67.137:50010:DISK
to 10.140.21.55:50010:DISK through 10.115.67.36:50010
> {quote}
> What happened here is, when there are no mover threads available, DDatanode.isPendingQEmpty()
will return false, so Balancer hung.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message