hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manoj Govindassamy (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-10480) Add an admin command to list currently open files
Date Wed, 17 May 2017 21:30:04 GMT

     [ https://issues.apache.org/jira/browse/HDFS-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Manoj Govindassamy updated HDFS-10480:
    Attachment: HDFS-10480.02.patch

Thanks for the review [~andrew.wang], [~kihwal]. Attached v02 patch to address the following.
Can you please take a look.

bq. Is there a reason for dumping the info to a file on the NN? This makes it more difficult
for admins to get the information, and is more complicated than just printing it out on the
command line. Allowing a user-specified name that isn't validated is also a possible security
issue. This also means normal users can't use this, since they won't have access to the NN's
log directory.
The design is changed now. Client now gets a RemoteIterator for the open files, and the list
is retrieved in batches from NameNode. The fetching batch size is configurable. This light
weight model helps NameNode to serve any humongous list with ease.

bq. Let's not change the import to a wildcard, it makes backports harder.

bq. Shouldn't this only go to the active NN, since it has up-to-date info about writers? This
is in reference to the Operation.UNCHECKED and the HA logic in DFSAdmin.

bq. Nit: "getUnderconstructionFiles" -> "getUnderConstructionFiles"

bq. Could you also add a Java API to HdfsAdmin?

bq. One more thing that would be nice here is to filter the output on a passed path or DN.
Usecases: An admin might already know a stale file by path (perhaps from fsck's -openforwrite),
and wants to figure out who the lease holder is. A DN might be blocked from decommissioning
by an open-for-write file, and the admin wants to figure out what files those are.
bq. With thousand+ node clusters, where you might be adding and removing machines regularly
for maintenance, a huge use case on top of the directory filter would be a "which open files
are blocking server decommissioning" filter (identify files with blocks on hosts that are
currently in decommisioning state).
With the attached patch, the infrastructure is now available to get the above enhancements.
In the interest of patch size and easy backports, can take up  above enhancements in a new
jira, if you are ok.

> Add an admin command to list currently open files
> -------------------------------------------------
>                 Key: HDFS-10480
>                 URL: https://issues.apache.org/jira/browse/HDFS-10480
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Kihwal Lee
>            Assignee: Manoj Govindassamy
>         Attachments: HDFS-10480.02.patch, HDFS-10480-trunk-1.patch, HDFS-10480-trunk.patch
> Currently there is no easy way to obtain the list of active leases or files being written.
It will be nice if we have an admin command to list open files and their lease holders.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message