hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manoj Govindassamy (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-10780) Block replication not proceeding after pipeline recovery -- TestDataNodeHotSwapVolumes fails
Date Sat, 27 Aug 2016 00:20:20 GMT

     [ https://issues.apache.org/jira/browse/HDFS-10780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Manoj Govindassamy updated HDFS-10780:
--------------------------------------
    Summary: Block replication not proceeding after pipeline recovery -- TestDataNodeHotSwapVolumes
fails  (was: Block replication not happening on removing a volume when data being written
to a datanode -- TestDataNodeHotSwapVolumes fails)

> Block replication not proceeding after pipeline recovery -- TestDataNodeHotSwapVolumes
fails
> --------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10780
>                 URL: https://issues.apache.org/jira/browse/HDFS-10780
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>
> TestDataNodeHotSwapVolumes occasionally fails in the unit test testRemoveVolumeBeingWrittenForDatanode.
 Data write pipeline can have issues as there could be timeouts, data node not reachable etc,
and in this test case it was more of induced one as one of the volumes in a datanode is removed
while block write is in progress. Digging further in the logs, when the problem happens in
the write pipeline, the error recovery is not happening as expected leading to block replication
never catching up.
> Running org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 44.495 sec <<<
FAILURE! - in org.apache.hadoop.hdfs.serv
> testRemoveVolumeBeingWritten(org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes)
 Time elapsed: 44.354 se
> java.util.concurrent.TimeoutException: Timed out waiting for /test to reach 3 replicas
> Results :
> Tests in error: 
>   TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten:637->testRemoveVolumeBeingWrittenForDatanode:714
ยป Timeout
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0
> Following exceptions are not expected in this test run
> {noformat}
>  614 2016-08-10 12:30:11,269 [DataXceiver for client DFSClient_NONMAPREDUCE_-640082112_10
at /127.0.0.1:58805 [Receiving block BP-1852988604-172.16.3.66-1470857409044:blk_1073741825_1001]]
DEBUG datanode.Da     taNode (DataXceiver.java:run(320)) - 127.0.0.1:58789:Number of active
connections is: 2
>  615 java.lang.IllegalMonitorStateException
>  616         at java.lang.Object.wait(Native Method)
>  617         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.waitVolumeRemoved(FsVolumeList.java:280)
>  618         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.removeVolumes(FsDatasetImpl.java:517)
>  619         at org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:832)
>  620         at org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:798)
> {noformat}
> {noformat}
>  720 2016-08-10 12:30:11,287 [DataNode: [[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/,
[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-projec     t/hadoop-hdfs/target/test/data/dfs/data/data2/]]
 heartbeating to localhost/127.0.0.1:58788] ERROR datanode.DataNode (BPServiceActor.java:run(768))
- Exception in BPOfferService for Block pool BP-18529     88604-172.16.3.66-1470857409044
(Datanode Uuid 711d58ad-919d-4350-af1e-99fa0b061244) service to localhost/127.0.0.1:58788
>  721 java.lang.NullPointerException
>  722         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReports(FsDatasetImpl.java:1841)
>  723         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:336)
>  724         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:624)
>  725         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:766)
>  726         at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message