hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wei-Chiu Chuang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9908) Datanode should tolerate disk failure during NN handshake
Date Fri, 04 Mar 2016 21:03:40 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180534#comment-15180534
] 

Wei-Chiu Chuang commented on HDFS-9908:
---------------------------------------

For completeness, here's the related logs in DN:

*DN connects to NN:*
2016-02-18 02:20:37,949 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened IPC server
at /10.107.162.126:50020
2016-02-18 02:20:38,034 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request
received for nameservices: nameservice1
2016-02-18 02:20:38,067 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting BPOfferServices
for nameservices: nameservice1
2016-02-18 02:20:38,076 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering>
(Datanode Uuid unassigned) service to namenode1.weichiu.com/10.107.162.110:8022 starting to
offer service
2016-02-18 02:20:38,077 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering>
(Datanode Uuid unassigned) service to namenode2.weichiu.com10.107.162.120:8022 starting to
offer service
2016-02-18 02:20:38,085 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2016-02-18 02:20:38,085 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting
2016-02-18 02:20:39,211 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: namenode1.weichiu.com/10.107.162.110:8022.
Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)

*Then DN does handshake, gets bpid from NN, and then analyze storage:*

2016-02-18 02:20:53,512 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/1/dfs/dn/in_use.lock
acquired by nodename 5124@namenode1.weichiu.com
2016-02-18 02:20:53,563 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage
directories for bpid BP-1018136951-49.4.167.110-1403564146510
2016-02-18 02:20:53,563 INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled
2016-02-18 02:20:53,606 INFO org.apache.hadoop.hdfs.server.common.Storage: Restored 0 block
files from trash.
....
all of the disks are successful

*But one of them failed to scan:*

2016-02-18 02:23:36,224 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Caught exception while scanning /data/8/dfs/dn/current.
Will throw later.
ExitCodeException exitCode=1: du: cannot access `/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir228/subdir11/blk_
1088686909': Input/output error
du: cannot access `/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir228/subdir11/blk_1088686909_14954023.meta':
Inp
ut/output error
du: cannot access `/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir69/blk_1093551560':
Input/output error
du: cannot access `/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir69/blk_1093551560_19818947.meta':
Inpu
t/output error
du: cannot access `/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir116/blk_1093563577':
Input/output erro
r
du: cannot access `/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir116/blk_1093563577_19830979.meta':
Inp
ut/output error
du: cannot access `/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir71/blk_1093552125':
Input/output error
du: cannot access `/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir71/blk_1093551897':
Input/output error
du: cannot access `/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir71/blk_1093551897_19819284.meta':
Inpu
t/output error
du: cannot access `/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir71/blk_1093552003_19819390.meta':
Inpu
t/output error
du: cannot access `/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir71/blk_1093552003':
Input/output error
du: cannot access `/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir71/blk_1093552125_19819512.meta':
Inpu
t/output error
du: cannot access `/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir70/blk_1093551747':
Input/output error
du: cannot access `/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir70/blk_1093551747_19819134.meta':
Inpu
t/output error
du: cannot access `/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir68/blk_1093551249_19818632.meta':
Inpu
t/output error
du: cannot access `/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir46/subdir68/blk_1093551249':
Input/output error

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
        at org.apache.hadoop.util.Shell.run(Shell.java:455)
        at org.apache.hadoop.fs.DU.run(DU.java:190)
        at org.apache.hadoop.fs.DU.<init>(DU.java:70)
        at org.apache.hadoop.fs.DU.<init>(DU.java:95)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.<init>(BlockPoolSlice.java:116)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addBlockPool(FsVolumeImpl.java:284)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$2.run(FsVolumeList.java:247)

*At this point, DN will attempt to handshake with NN again, only to find the disk is being
used due to lock file:*

2016-02-18 02:23:36,238 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory
[DISK]file:/data/1/dfs/dn/ has already been used.
2016-02-18 02:23:36,273 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage
directories for bpid BP-1018136951-49.4.167.110-1403564146510
2016-02-18 02:23:36,273 WARN org.apache.hadoop.hdfs.server.common.Storage: Failed to analyze
storage directories for block pool BP-1018136951-49.4.167.110-1403564146510
java.io.IOException: BlockPoolSliceStorage.recoverTransitionRead: attempt to load an used
block storage: /data/1/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510
        at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:210)
        at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:242)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:391)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:472)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1322)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1292)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:320)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:862)
        at java.lang.Thread.run(Thread.java:745)

*Finally, it will think all disks are being used, and can't successfully perform the second
handshake:*
2016-02-18 02:23:36,558 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization
failed for Block pool <registering> (Datanode Uuid unassigned) service to namenode1.weichiu.com/10.107.162.120:8022.
Exiting.
java.io.IOException: All specified directories are failed to load.
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:473)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1322)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1292)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:320)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:862)
        at java.lang.Thread.run(Thread.java:745)
2016-02-18 02:23:36,559 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool ID
needed, but service not yet registered with NN

> Datanode should tolerate disk failure during NN handshake
> ---------------------------------------------------------
>
>                 Key: HDFS-9908
>                 URL: https://issues.apache.org/jira/browse/HDFS-9908
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.5.0
>         Environment: CDH5.3.3
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>
> DN may treat a disk failure exception as NN handshake exception, and this can prevent
a DN to join a cluster even if most of its disks are healthy.
> During NN handshake, DN initializes block pools. It will create a lock files per disk,
and then scan the volumes. However, if the scanning throws exceptions due to disk failure,
DN will think it's an exception because NN is inconsistent with the local storage (see {{DataNode#initBlockPool}}.
As a result, it will attempt to reconnect to NN again.
> However, at this point, DN has not deleted its lock files on the disks. If it reconnects
to NN again, it will think the same disks are already being used, and then it will fail handshake
again because all disks can not be used (due to locking), and repeatedly. This will happen
even if the DN has multiple disks, and only one of them fails. The DN will not be able to
connect to NN despite just one failing disk. Note that it is possible to successfully create
a lock file on a disk, and then has error scanning the disk.
> We saw this on a CDH 5.3.3 cluster (which is based on Apache Hadoop 2.5.0, and we still
see the same bug in 3.0.0 trunk branch). The root cause is that DN treats an internal error
(single disk failure) as an external one (NN handshake failure) and we should fix it.
> {code:title=DataNode.java}
> /**
>    * One of the Block Pools has successfully connected to its NN.
>    * This initializes the local storage for that block pool,
>    * checks consistency of the NN's cluster ID, etc.
>    * 
>    * If this is the first block pool to register, this also initializes
>    * the datanode-scoped storage.
>    * 
>    * @param bpos Block pool offer service
>    * @throws IOException if the NN is inconsistent with the local storage.
>    */
>   void initBlockPool(BPOfferService bpos) throws IOException {
>     NamespaceInfo nsInfo = bpos.getNamespaceInfo();
>     if (nsInfo == null) {
>       throw new IOException("NamespaceInfo not found: Block pool " + bpos
>           + " should have retrieved namespace info before initBlockPool.");
>     }
>     
>     setClusterId(nsInfo.clusterID, nsInfo.getBlockPoolID());
>     // Register the new block pool with the BP manager.
>     blockPoolManager.addBlockPool(bpos);
>     
>     // In the case that this is the first block pool to connect, initialize
>     // the dataset, block scanners, etc.
>     initStorage(nsInfo);
>     // Exclude failed disks before initializing the block pools to avoid startup
>     // failures.
>     checkDiskError();
>     data.addBlockPool(nsInfo.getBlockPoolID(), conf);  <----- this line throws disk
error exception
>     blockScanner.enableBlockPoolId(bpos.getBlockPoolId());
>     initDirectoryScanner(conf);
>   }
> {code}
> {{FsVolumeList#addBlockPool}} is the source of exception.
> {code:title=FsVolumeList.java}
>   void addBlockPool(final String bpid, final Configuration conf) throws IOException {
>     long totalStartTime = Time.monotonicNow();
>     
>     final List<IOException> exceptions = Collections.synchronizedList(
>         new ArrayList<IOException>());
>     List<Thread> blockPoolAddingThreads = new ArrayList<Thread>();
>     for (final FsVolumeImpl v : volumes) {
>       Thread t = new Thread() {
>         public void run() {
>           try (FsVolumeReference ref = v.obtainReference()) {
>             FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
>                 " on volume " + v + "...");
>             long startTime = Time.monotonicNow();
>             v.addBlockPool(bpid, conf);
>             long timeTaken = Time.monotonicNow() - startTime;
>             FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
>                 " on " + v + ": " + timeTaken + "ms");
>           } catch (ClosedChannelException e) {
>             // ignore.
>           } catch (IOException ioe) {
>             FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
>                 ". Will throw later.", ioe);
>             exceptions.add(ioe);
>           }
>         }
>       };
>       blockPoolAddingThreads.add(t);
>       t.start();
>     }
>     for (Thread t : blockPoolAddingThreads) {
>       try {
>         t.join();
>       } catch (InterruptedException ie) {
>         throw new IOException(ie);
>       }
>     }
>     if (!exceptions.isEmpty()) {
>       throw exceptions.get(0); <----- here's the original of exception
>     }
>     
>     long totalTimeTaken = Time.monotonicNow() - totalStartTime;
>     FsDatasetImpl.LOG.info("Total time to scan all replicas for block pool " +
>         bpid + ": " + totalTimeTaken + "ms");
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message