hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manoj Govindassamy (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-10780) Block replication not proceeding after pipeline recovery -- TestDataNodeHotSwapVolumes fails
Date Wed, 31 Aug 2016 01:22:21 GMT

     [ https://issues.apache.org/jira/browse/HDFS-10780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Manoj Govindassamy updated HDFS-10780:
--------------------------------------
    Attachment: HDFS-10780.001.patch

More details on the Issue 1:

*Problem:*
— After pipeline recovery (from data streaming failures), block replication to stale replicas
are not happening
— TestDataNodeHotSwapVolumes fails with “TimeoutException: Timed out waiting for /test
to reach 3 replicas” signature

*Analysis:*
— Assume write pipeline DN1 —> DN2 —> DN3
— For the {{UNDER_CONSTRUCTION}} Block, NameNode sets the *expected replicas* as DN1, DN2,
DN3
— DN1 encounters a write issue (here the volume is removed while write is in-progress)
— Client detects pipeline issue, triggers pipeline recovery and gets the new write pipeline
as DN2 —> DN3

— On a successful {{FSNameSystem::updatePipeline}} request from Client, NameNode bumps up
the Generation Stamp (from 001 to 002) of the UnderConstruction (that is, the last) block
of the file.
— All the current *expected replicas* are stale as they have lesser Generation Stamp compared
to the new one after the pipeline update.
— NameNode resets *expected replicas* with the new set of storage ids from the update pipeline,
which is {DN2, DN3}

— DNs send their Incremental Block Reports to NameNode. IBRs can have Blocks with old or
new Generation Stamp. And these replica blocks can be in any states — FINALIZED, RBW, RBR,
etc.,
— Assume, the stale replica DN1 sending IBR with the following
— — Replica Block State: RBW
— — Replica Block GS: 001 (stale)
— Assume, the good replica DN2, DN3 sending IBR with the following
— — Replica Block State: FINALIZED
— — Replica Block GS: 002 (good)


— {{BlockManager::processAndHandleReportedBlock}} when processing Incremental Block Reports,
for Replica blocks in RBW/RBR states, NameNode does not check block Generation Stamps until
the stored block is COMPLETE. Since the Block state at NN is still in UNDER_CONSTRUCTION,
the *Stale RBW block from DN1 gets accepted*

— {{BlockManager::addStoredBlockUnderConstruction}} assumes the replica block from corrupt
DN1 to be a good one and adds DN1’s StorageInfo to the expected replica locations. Refer:
{{BlockUnderConstructionFeature::addReplicaIfNotPresent}}. Thus *expected replicas* again
become (DN1, DN2, DN3).

— Later when the Client closes the file, {{FSNameSystem}} moves all the *expected replicas*
to pendingReconstrcution. Refer: {{FSNameSystem::addComittedBlocksToPending}}

— {{BlockManager::checkRedundancy}} mistakenly believes pendingReconstruction count 1 (for
DN1) is currently in-porgress and adding this to live replicas count 2 (for DN2, DN3), it
decides no more reconstruction needed as it matches the configured replication factor of 3.

— Since there wasn’t any block reconstruction triggered for DN1, test times out waiting
for the replication factor of 3. 


*Fix:*

— I believe the core issue here is in the processing of IBRs from stale replicas. Either

— — (A) {{BlockManager::checkReplicaCorrupt}} has to tag the block as corrupt, when the
replica state is RBW and when the block is not complete  OR
— — (B) {{BlockManager::addStoredBlockUnderConstruction}} should not ADD the corrupt replica
in the *expected replicas* for the under construction block

Attached patch has the fix (B). Also, wrote a unit test to explicitly check for expected replica
count under above line of events. 

[~eddyxu], [~andrew.wang], [~yzhangal] can you please take a look at the patch ?

> Block replication not proceeding after pipeline recovery -- TestDataNodeHotSwapVolumes
fails
> --------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10780
>                 URL: https://issues.apache.org/jira/browse/HDFS-10780
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>         Attachments: HDFS-10780.001.patch
>
>
> TestDataNodeHotSwapVolumes occasionally fails in the unit test testRemoveVolumeBeingWrittenForDatanode.
 Data write pipeline can have issues as there could be timeouts, data node not reachable etc,
and in this test case it was more of induced one as one of the volumes in a datanode is removed
while block write is in progress. Digging further in the logs, when the problem happens in
the write pipeline, the error recovery is not happening as expected leading to block replication
never catching up.
> Running org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 44.495 sec <<<
FAILURE! - in org.apache.hadoop.hdfs.serv
> testRemoveVolumeBeingWritten(org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes)
 Time elapsed: 44.354 se
> java.util.concurrent.TimeoutException: Timed out waiting for /test to reach 3 replicas
> Results :
> Tests in error: 
>   TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten:637->testRemoveVolumeBeingWrittenForDatanode:714
» Timeout
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0
> Following exceptions are not expected in this test run
> {noformat}
>  614 2016-08-10 12:30:11,269 [DataXceiver for client DFSClient_NONMAPREDUCE_-640082112_10
at /127.0.0.1:58805 [Receiving block BP-1852988604-172.16.3.66-1470857409044:blk_1073741825_1001]]
DEBUG datanode.Da     taNode (DataXceiver.java:run(320)) - 127.0.0.1:58789:Number of active
connections is: 2
>  615 java.lang.IllegalMonitorStateException
>  616         at java.lang.Object.wait(Native Method)
>  617         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.waitVolumeRemoved(FsVolumeList.java:280)
>  618         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.removeVolumes(FsDatasetImpl.java:517)
>  619         at org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:832)
>  620         at org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:798)
> {noformat}
> {noformat}
>  720 2016-08-10 12:30:11,287 [DataNode: [[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/,
[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-projec     t/hadoop-hdfs/target/test/data/dfs/data/data2/]]
 heartbeating to localhost/127.0.0.1:58788] ERROR datanode.DataNode (BPServiceActor.java:run(768))
- Exception in BPOfferService for Block pool BP-18529     88604-172.16.3.66-1470857409044
(Datanode Uuid 711d58ad-919d-4350-af1e-99fa0b061244) service to localhost/127.0.0.1:58788
>  721 java.lang.NullPointerException
>  722         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReports(FsDatasetImpl.java:1841)
>  723         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:336)
>  724         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:624)
>  725         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:766)
>  726         at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message