hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ade (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-10605) Can not synchronized call method of object and Mockito.spy(object), So UT:testRemoveVolumeBeingWritten passed but maybe deadlock online
Date Mon, 11 Jul 2016 09:57:11 GMT

     [ https://issues.apache.org/jira/browse/HDFS-10605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

ade updated HDFS-10605:
-----------------------
    Description: 
The UT: TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten can be ran successful, but
deadlock like HDFS-9874 maybe happen online.
* UT: 
{code:title=TestDataNodeHotSwapVolumes.java|borderStyle=solid}
final FsDatasetSpi<? extends FsVolumeSpi> data = dn.data;
    dn.data = Mockito.spy(data);
    LOG.info("data hash:" + data.hashCode() + "; dn.data hash:" + dn.data.hashCode());
    doAnswer(new Answer<Object>() {
          public Object answer(InvocationOnMock invocation)
              throws IOException, InterruptedException {
            Thread.sleep(1000);
            // Bypass the argument to FsDatasetImpl#finalizeBlock to verify that
            // the block is not removed, since the volume reference should not
            // be released at this point.
            data.finalizeBlock((ExtendedBlock) invocation.getArguments()[0]);
            return null;
          }
        }).when(dn.data).finalizeBlock(any(ExtendedBlock.class));
{code}
2 thread can run synchronized method dn.data.removeVolumes and data.finalizeBlock concurrently
because dn.data(mocked) and data is not the same object.
{noformat}
2016-07-11 16:16:07,788 INFO  [Thread-0] datanode.TestDataNodeHotSwapVolumes (TestDataNodeHotSwapVolumes.java:testRemoveVolumeBeingWrittenForDatanode(599))
- data hash:1903955157; dn.data hash:1508483764
2016-07-11 16:16:07,801 INFO  [Thread-157] datanode.DataNode (DataNode.java:reconfigurePropertyImpl(456))
- Reconfiguring dfs.datanode.data.dir to [DISK]file:/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data2
2016-07-11 16:16:07,810 WARN  [Thread-157] common.Util (Util.java:stringAsURI(56)) - Path
/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
should be specified as a URI in configuration files. Please update hdfs configuration.
2016-07-11 16:16:07,811 INFO  [Thread-157] datanode.DataNode (DataNode.java:removeVolumes(674))
- Deactivating volumes (clear failure=true): /Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
2016-07-11 16:16:07,836 INFO  [Thread-157] impl.FsDatasetImpl (FsDatasetImpl.java:removeVolumes(459))
- Removing /Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
from FsDataset.
2016-07-11 16:16:07,836 INFO  [Thread-157] impl.FsDatasetImpl (FsDatasetImpl.java:removeVolumes(463))
- removeVolumes of object hash:1508483764
2016-07-11 16:16:07,836 INFO  [Thread-157] datanode.BlockScanner (BlockScanner.java:removeVolumeScanner(243))
- Removing scanner for volume /Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
(StorageID DS-f4df3404-9f02-470e-b202-75f5a4de29cb)
2016-07-11 16:16:07,836 INFO  [VolumeScannerThread(/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1)]
datanode.VolumeScanner (VolumeScanner.java:run(630)) - VolumeScanner(/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1,
DS-f4df3404-9f02-470e-b202-75f5a4de29cb) exiting.
2016-07-11 16:16:07,891 INFO  [IPC Server handler 7 on 63546] blockmanagement.DatanodeDescriptor
(DatanodeDescriptor.java:pruneStorageMap(517)) - Removed storage [DISK]DS-f4df3404-9f02-470e-b202-75f5a4de29cb:NORMAL:127.0.0.1:63548
from DataNode127.0.0.1:63548
2016-07-11 16:16:07,908 INFO  [IPC Server handler 9 on 63546] blockmanagement.DatanodeDescriptor
(DatanodeDescriptor.java:updateStorage(866)) - Adding new storage ID DS-f4df3404-9f02-470e-b202-75f5a4de29cb
for DN 127.0.0.1:63548
2016-07-11 16:16:08,845 INFO  [PacketResponder: BP-1077872064-127.0.0.1-1468224964600:blk_1073741825_1001,
type=LAST_IN_PIPELINE, downstreams=0:[]] impl.FsDatasetImpl (FsDatasetImpl.java:finalizeBlock(1559))
- finalizeBlock of object hash:1903955157
2016-07-11 16:16:12,933 INFO  [DataXceiver for client  at /127.0.0.1:63574 [Receiving block
BP-1077872064-127.0.0.1-1468224964600:blk_1073741825_1001]] impl.FsDatasetImpl (FsDatasetImpl.java:finalizeBlock(1559))
- finalizeBlock of object hash:1903955157
{noformat}
The UT ran passed.

* Online
When dn.data.removeVolumes the thread run in FsVolumeImpl.closeAndWait() with dn.data lock
and wait referenceCount() = 0, but the other DataXceiver thread maybe blocked by dn.data lock
and with referencing volume. This can be happened like HDFS-9874.

  was:
UT: TestDataNodeHotSwapVolumes.
{code:title=TestDataNodeHotSwapVolumes.java|borderStyle=solid}
final FsDatasetSpi<? extends FsVolumeSpi> data = dn.data;
    dn.data = Mockito.spy(data);
    LOG.info("data hash:" + data.hashCode() + "; dn.data hash:" + dn.data.hashCode());
    doAnswer(new Answer<Object>() {
          public Object answer(InvocationOnMock invocation)
              throws IOException, InterruptedException {
            Thread.sleep(1000);
            // Bypass the argument to FsDatasetImpl#finalizeBlock to verify that
            // the block is not removed, since the volume reference should not
            // be released at this point.
            data.finalizeBlock((ExtendedBlock) invocation.getArguments()[0]);
            return null;
          }
        }).when(dn.data).finalizeBlock(any(ExtendedBlock.class));
{code}

{noformat}
2016-07-11 16:16:07,788 INFO  [Thread-0] datanode.TestDataNodeHotSwapVolumes (TestDataNodeHotSwapVolumes.java:testRemoveVolumeBeingWrittenForDatanode(599))
- data hash:1903955157; dn.data hash:1508483764
2016-07-11 16:16:07,801 INFO  [Thread-157] datanode.DataNode (DataNode.java:reconfigurePropertyImpl(456))
- Reconfiguring dfs.datanode.data.dir to [DISK]file:/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data2
2016-07-11 16:16:07,810 WARN  [Thread-157] common.Util (Util.java:stringAsURI(56)) - Path
/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
should be specified as a URI in configuration files. Please update hdfs configuration.
2016-07-11 16:16:07,811 INFO  [Thread-157] datanode.DataNode (DataNode.java:removeVolumes(674))
- Deactivating volumes (clear failure=true): /Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
2016-07-11 16:16:07,836 INFO  [Thread-157] impl.FsDatasetImpl (FsDatasetImpl.java:removeVolumes(459))
- Removing /Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
from FsDataset.
2016-07-11 16:16:07,836 INFO  [Thread-157] impl.FsDatasetImpl (FsDatasetImpl.java:removeVolumes(463))
- removeVolumes of object hash:1508483764
2016-07-11 16:16:07,836 INFO  [Thread-157] datanode.BlockScanner (BlockScanner.java:removeVolumeScanner(243))
- Removing scanner for volume /Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
(StorageID DS-f4df3404-9f02-470e-b202-75f5a4de29cb)
2016-07-11 16:16:07,836 INFO  [VolumeScannerThread(/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1)]
datanode.VolumeScanner (VolumeScanner.java:run(630)) - VolumeScanner(/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1,
DS-f4df3404-9f02-470e-b202-75f5a4de29cb) exiting.
2016-07-11 16:16:07,891 INFO  [IPC Server handler 7 on 63546] blockmanagement.DatanodeDescriptor
(DatanodeDescriptor.java:pruneStorageMap(517)) - Removed storage [DISK]DS-f4df3404-9f02-470e-b202-75f5a4de29cb:NORMAL:127.0.0.1:63548
from DataNode127.0.0.1:63548
2016-07-11 16:16:07,908 INFO  [IPC Server handler 9 on 63546] blockmanagement.DatanodeDescriptor
(DatanodeDescriptor.java:updateStorage(866)) - Adding new storage ID DS-f4df3404-9f02-470e-b202-75f5a4de29cb
for DN 127.0.0.1:63548
2016-07-11 16:16:08,845 INFO  [PacketResponder: BP-1077872064-127.0.0.1-1468224964600:blk_1073741825_1001,
type=LAST_IN_PIPELINE, downstreams=0:[]] impl.FsDatasetImpl (FsDatasetImpl.java:finalizeBlock(1559))
- finalizeBlock of object hash:1903955157
2016-07-11 16:16:12,933 INFO  [DataXceiver for client  at /127.0.0.1:63574 [Receiving block
BP-1077872064-127.0.0.1-1468224964600:blk_1073741825_1001]] impl.FsDatasetImpl (FsDatasetImpl.java:finalizeBlock(1559))
- finalizeBlock of object hash:1903955157
{noformat}


> Can not synchronized call method of object and Mockito.spy(object), So UT:testRemoveVolumeBeingWritten
passed but maybe deadlock online
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10605
>                 URL: https://issues.apache.org/jira/browse/HDFS-10605
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.7.0, 2.8.0, 2.7.1, 2.7.2
>            Reporter: ade
>              Labels: test
>
> The UT: TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten can be ran successful,
but deadlock like HDFS-9874 maybe happen online.
> * UT: 
> {code:title=TestDataNodeHotSwapVolumes.java|borderStyle=solid}
> final FsDatasetSpi<? extends FsVolumeSpi> data = dn.data;
>     dn.data = Mockito.spy(data);
>     LOG.info("data hash:" + data.hashCode() + "; dn.data hash:" + dn.data.hashCode());
>     doAnswer(new Answer<Object>() {
>           public Object answer(InvocationOnMock invocation)
>               throws IOException, InterruptedException {
>             Thread.sleep(1000);
>             // Bypass the argument to FsDatasetImpl#finalizeBlock to verify that
>             // the block is not removed, since the volume reference should not
>             // be released at this point.
>             data.finalizeBlock((ExtendedBlock) invocation.getArguments()[0]);
>             return null;
>           }
>         }).when(dn.data).finalizeBlock(any(ExtendedBlock.class));
> {code}
> 2 thread can run synchronized method dn.data.removeVolumes and data.finalizeBlock concurrently
because dn.data(mocked) and data is not the same object.
> {noformat}
> 2016-07-11 16:16:07,788 INFO  [Thread-0] datanode.TestDataNodeHotSwapVolumes (TestDataNodeHotSwapVolumes.java:testRemoveVolumeBeingWrittenForDatanode(599))
- data hash:1903955157; dn.data hash:1508483764
> 2016-07-11 16:16:07,801 INFO  [Thread-157] datanode.DataNode (DataNode.java:reconfigurePropertyImpl(456))
- Reconfiguring dfs.datanode.data.dir to [DISK]file:/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data2
> 2016-07-11 16:16:07,810 WARN  [Thread-157] common.Util (Util.java:stringAsURI(56)) -
Path /Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
should be specified as a URI in configuration files. Please update hdfs configuration.
> 2016-07-11 16:16:07,811 INFO  [Thread-157] datanode.DataNode (DataNode.java:removeVolumes(674))
- Deactivating volumes (clear failure=true): /Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
> 2016-07-11 16:16:07,836 INFO  [Thread-157] impl.FsDatasetImpl (FsDatasetImpl.java:removeVolumes(459))
- Removing /Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
from FsDataset.
> 2016-07-11 16:16:07,836 INFO  [Thread-157] impl.FsDatasetImpl (FsDatasetImpl.java:removeVolumes(463))
- removeVolumes of object hash:1508483764
> 2016-07-11 16:16:07,836 INFO  [Thread-157] datanode.BlockScanner (BlockScanner.java:removeVolumeScanner(243))
- Removing scanner for volume /Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
(StorageID DS-f4df3404-9f02-470e-b202-75f5a4de29cb)
> 2016-07-11 16:16:07,836 INFO  [VolumeScannerThread(/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1)]
datanode.VolumeScanner (VolumeScanner.java:run(630)) - VolumeScanner(/Users/ade/workspace/Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1,
DS-f4df3404-9f02-470e-b202-75f5a4de29cb) exiting.
> 2016-07-11 16:16:07,891 INFO  [IPC Server handler 7 on 63546] blockmanagement.DatanodeDescriptor
(DatanodeDescriptor.java:pruneStorageMap(517)) - Removed storage [DISK]DS-f4df3404-9f02-470e-b202-75f5a4de29cb:NORMAL:127.0.0.1:63548
from DataNode127.0.0.1:63548
> 2016-07-11 16:16:07,908 INFO  [IPC Server handler 9 on 63546] blockmanagement.DatanodeDescriptor
(DatanodeDescriptor.java:updateStorage(866)) - Adding new storage ID DS-f4df3404-9f02-470e-b202-75f5a4de29cb
for DN 127.0.0.1:63548
> 2016-07-11 16:16:08,845 INFO  [PacketResponder: BP-1077872064-127.0.0.1-1468224964600:blk_1073741825_1001,
type=LAST_IN_PIPELINE, downstreams=0:[]] impl.FsDatasetImpl (FsDatasetImpl.java:finalizeBlock(1559))
- finalizeBlock of object hash:1903955157
> 2016-07-11 16:16:12,933 INFO  [DataXceiver for client  at /127.0.0.1:63574 [Receiving
block BP-1077872064-127.0.0.1-1468224964600:blk_1073741825_1001]] impl.FsDatasetImpl (FsDatasetImpl.java:finalizeBlock(1559))
- finalizeBlock of object hash:1903955157
> {noformat}
> The UT ran passed.
> * Online
> When dn.data.removeVolumes the thread run in FsVolumeImpl.closeAndWait() with dn.data
lock and wait referenceCount() = 0, but the other DataXceiver thread maybe blocked by dn.data
lock and with referencing volume. This can be happened like HDFS-9874.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message