hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Isaacson (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-3787) BlockManager#close races with ReplicationMonitor#run
Date Fri, 10 Aug 2012 01:33:19 GMT

     [ https://issues.apache.org/jira/browse/HDFS-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andy Isaacson updated HDFS-3787:
--------------------------------

    Attachment: hdfs-3787-2.txt

Updated patch implementing Karthik's suggested rework, plus my join(3000) proposal.

Oddly, Jira seems to be missing the "Submit patch" button, so I can't trigger Jenkins.
                
> BlockManager#close races with ReplicationMonitor#run
> ----------------------------------------------------
>
>                 Key: HDFS-3787
>                 URL: https://issues.apache.org/jira/browse/HDFS-3787
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 2.0.0-alpha
>            Reporter: Andy Isaacson
>            Assignee: Andy Isaacson
>            Priority: Minor
>         Attachments: hdfs-3787-2.txt, hdfs-3787.txt
>
>
> We saw {{TestDirectoryScanner}} fail during shutdown:
> {code}
> 2012-08-09 12:17:19,844 WARN  datanode.DataNode (BPServiceActor.java:run(683)) - Ending
block pool service for: Block pool BP-610123021-172.29.121.238-1344539835759 (storage id DS-1581877160-172.29.121.238-43609-1344539837880)
service to localhost/127.0.0.1:40012
> ...
> 2012-08-09 12:17:19,876 FATAL blockmanagement.BlockManager (BlockManager.java:run(3039))
- ReplicationMonitor thread received Runtime exception. 
> java.lang.NullPointerException
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.getBlockCollection(BlocksMap.java:101)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1141)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1116)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3070)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3032)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}
> Inspecting the code, it appears that {{BlockManager#close -> BlocksMap#close}} can
set {{blocks}} to {{null}} while {{computeDatanodeWork}} is running.
> The fix seems simple -- have {{close}} just set an exit flag, and have {{ReplicationMonitor#run}}
call {{BlocksMap#close}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message