hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10480) Add an admin command to list currently open files
Date Tue, 23 May 2017 00:20:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020452#comment-16020452

Andrew Wang commented on HDFS-10480:

Thanks for working on this Manoj. Looks good overall!

One high-level question first, what do we envision as the usecases for this command? I figured
it was for:

# Debugging lease manager state
# Finding open files that are blocking decommission

To do the first, we probably shouldn't skip erroneous leases:

      if (!inodeFile.isUnderConstruction()) {
        LOG.warn("The file " + inodeFile.getFullPathName()
            + " is not under construction but has lease.");

The admin invoking the command also won't see this WARN since it goes to the NN log, not the
client log. The log is still a bit useful, but there should be some non-NN-log way for admins
to debug erroneous state here. I guess they can cross-check with fsck information?

For the second, the admin is wondering why some DN hasn't finished decomming yet, and wants
to find the UC blocks and the client and path. It looks like HDFS-11847 will make this easy,
without needing to resort to fsck. Nice.

But what's the workflow where we need HDFS-11848? This new command is much lighter weight
than {{fsck -openforwrite}}, so I'd like to encourage users to use the new command instead.
Just wondering, before we add some new functionality.

Some review comments:

* Maybe bump the NUM_RESPONSES limit to 1000, to match {{DFS_LIST_LIMIT}}?
* Should the precondition check for {{NUM_RESPONSES}} check for {{> 0}} rather than {{>=
0}} ? FWIW, {{0}} is also not a positive integer.
* Based on HDFS-9395, we should only generate an audit event when the op is successful, or
fails due to an ACE. Notably, it should not log for things like an IOE.
* {{LeaseManager#getUnderConstructionFiles}} makes a new TreeMap out of {{leasesById}}. This
is potentially a lot of garbage. Can we make {{leasesById}} a TreeMap instead to avoid this?
TreeMaps still have pretty good performance.
* Can we also add an assert that the FSN read lock is held?

* I like the step-up/step-down with the open and closed file sets. Could we take the verification
one step further, and do it in a for-loop? This way we test all the way from {{0..numOpenFiles}}
rather than just at {{numOpenFiles}} and {{numOpenFiles/2}}
* testListOpenFilesInHA, it'd be nice to see what happens when there's a failover between
batches while iterating. I also suggest perhaps moving this into {{TestListOpenFiles}} since
it doesn't really relate to append.
* Do we have any tests for the {{HdfsAdmin}} API? It'd be better to test against this than
the one in {{DistributedFileSystem}}, since our end users will be programming against {{HdfsAdmin}}.

> Add an admin command to list currently open files
> -------------------------------------------------
>                 Key: HDFS-10480
>                 URL: https://issues.apache.org/jira/browse/HDFS-10480
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Kihwal Lee
>            Assignee: Manoj Govindassamy
>         Attachments: HDFS-10480.02.patch, HDFS-10480.03.patch, HDFS-10480.04.patch, HDFS-10480-trunk-1.patch,
> Currently there is no easy way to obtain the list of active leases or files being written.
It will be nice if we have an admin command to list open files and their lease holders.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message