hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4239) Means of telling the datanode to stop using a sick disk
Date Thu, 17 Apr 2014 00:55:22 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13972160#comment-13972160

Colin Patrick McCabe commented on HDFS-4239:

Thanks for picking this up again, [~jxiang].  {{hdfs-4239_v5.patch}} did not apply cleanly
to trunk for me; can you re-generate this patch?

{{DataNode#checkSuperuserPrivilege}}: As Yongjun commented, it's kind of unfortunate that
you are skipping this check when Kerberos is disabled.  This will make unit testing the "permission
denied" case harder than it has to be.  I suggest using {{FSPermissionChecker}}.  The constructor
takes three arguments: the current UGI, the superuser, and supergroup, and by calling {{FSPermissionChecker#checkSuperuserPrivilege}},
you can figure out if you have superuser.  I realize you were following the pattern in {{checkBlockLocalPathAccess}},
but that code is legacy now (only used in the legacy local block reader).

{{VERIFICATION_PREFIX}}: as Yongjun commented, I think you meant to keep the "private" on

{{DataStorage.java}}: the only change here is a whitespace change.  Seems like you meant to
cut this out of the diff.

{{BlockPoolSliceScanner}}: I don't think the file reopening belongs in the slice scanner.
 {{LogFileHandler}} is actually a pretty abstract interface which is unconcerned with the
details of where things are in files.  We get it out of {{FsDataSetSpi#createRollingLogs}}.
 Since the normal file rolling stuff is handled in {{RollingLogsImpl}}, that's where the re-opening
of verification logs should be handled as well.  It's the same kind of thing.

Another way of thinking about it is: while it's true that currently the verification logs
reside in files somewhere in directories on disk, this is an implementation detail.  You can
easily imagine a different implementation of {{FsDatasetSpi}} where there are no on-disk directories,
or where the verification log is kept in memory.

Also, I noticed you were trying to copy the old verification log from the failed disk in your
{{relocateVerificationLogs}} function.  This is not a good idea, since reading from a failed
disk may hang or misbehave, causing issues with the DataNode's threads.  We want to treat
a failed disk as radioactive and not do any more reads or writes from there if we can help

@@ -930,12 +940,15 @@ synchronized Packet waitForAckHead(long seqno) throws InterruptedException
     public synchronized void close() {
-      while (isRunning() && ackQueue.size() != 0) {
-        try {
-          wait();
-        } catch (InterruptedException e) {
-          running = false;
-          Thread.currentThread().interrupt();
+      try {
+        while (isRunning() && ackQueue.size() != 0) {
+          wait(runningCheckInterval);
+        }
+      } catch (InterruptedException e) {
+        Thread.currentThread().interrupt();
+      } catch (IOException ioe) {
+        if(LOG.isDebugEnabled()) {
+          LOG.debug(myString, ioe);

Why are we catching and swallowing {{IOException}} here?

+  @Override //FsDatasetSpi
+  public FsVolumeImpl markDownVolume(File location

I don't like the assumption that our volumes are on files here.  This may not be true for
all {{FsDataSetSpi}} implementations.  Instead, how about changing this to take a URI instead?

Similarly, let's change the user API (in DistributeFileSystem, etc.) to take a URI as well,
up to the user level.  So the user can ask us to mark down {{file:///data/1}} or something
like that.  That way, when we later implement different volumes which aren't on files, we
can easily refer to them.

Also, how do you feel about {{disableVolume}} instead of {{markDownVolume}}?  "markdown" just
makes me think of the markup language (maybe that's just me?)

I think we need a way of listing all the volume URIs on a particular DN, and a way of listing
all the currently disabled volume URIs on a DN.  Otherwise it makes life hard for sysadmins.

Another comment: we need to notify outstanding {{BlockReaderLocal}} instances that the volume
is disabled.  We can do this by using the shared memory segment.  This should avoid the problem
that JD pointed out here:

bq. We tried it today, it worked fine. We did encounter an interesting problem tho, the region
server on the same node continued to use that disk directly since it's configured with local

If you want to do this in a follow-up change then file a JIRA for that?

> Means of telling the datanode to stop using a sick disk
> -------------------------------------------------------
>                 Key: HDFS-4239
>                 URL: https://issues.apache.org/jira/browse/HDFS-4239
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: stack
>            Assignee: Jimmy Xiang
>         Attachments: hdfs-4239.patch, hdfs-4239_v2.patch, hdfs-4239_v3.patch, hdfs-4239_v4.patch,
> If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing occasionally,
or just exhibiting high latency -- your choices are:
> 1. Decommission the total datanode.  If the datanode is carrying 6 or 12 disks of data,
especially on a cluster that is smallish -- 5 to 20 nodes -- the rereplication of the downed
datanode's data can be pretty disruptive, especially if the cluster is doing low latency serving:
e.g. hosting an hbase cluster.
> 2. Stop the datanode, unmount the bad disk, and restart the datanode (You can't unmount
the disk while it is in use).  This latter is better in that only the bad disk's data is rereplicated,
not all datanode data.
> Is it possible to do better, say, send the datanode a signal to tell it stop using a
disk an operator has designated 'bad'.  This would be like option #2 above minus the need
to stop and restart the datanode.  Ideally the disk would become unmountable after a while.
> Nice to have would be being able to tell the datanode to restart using a disk after its
been replaced.

This message was sent by Atlassian JIRA

View raw message