hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhe Zhang <...@apache.org>
Subject Re: Listing large directories via WebHDFS
Date Wed, 19 Oct 2016 21:40:20 GMT
Thanks Xiao!

Seems like server-side throttling are still vulnerable to abusing users
issuing large listing requests. Once such a request is scheduled, it will
keep listing potentially millions of files without having to go through
IPC/RPC queue again. It does have to compete for fsn lock though, thanks to
this server-side throttling logic.

On Wed, Oct 19, 2016 at 2:33 PM Xiao Chen <xiao@cloudera.com> wrote:

> Hi Zhe,
>
> Per my understanding, the runner in webhdfs goes to NamenodeWebHdfsMethods
> <https://github.com/apache/hadoop/blob/e9c4616b5e47e9c616799abc532269572ab24e6e/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java#L972>,
> which eventually calls FSNameSystem#getListing. So it's still throttled on
> the NN side. Up for discussions for ddos part...
>
> Also, Andrew did some pagination features for webhdfs/httpfs via
> https://issues.apache.org/jira/browse/HDFS-10784 and
> https://issues.apache.org/jira/browse/HDFS-10823, to provide better
> control.
>
> Best,
>
> -Xiao
>
> On Wed, Oct 19, 2016 at 2:08 PM, Zhe Zhang <zhz@apache.org> wrote:
>
> Hi,
>
> The regular HDFS client (DistributedFileSystem) throttles the workload of
> listing large directories by dividing the work into batches, something like
> below:
> {code}
>     // fetch the first batch of entries in the directory
>     DirectoryListing thisListing = dfs.listPaths(
>         src, HdfsFileStatus.EMPTY_NAME);
>      ......
>     if (!thisListing.hasMore()) { // got all entries of the directory
>       FileStatus[] stats = new FileStatus[partialListing.length];
> {code}
>
> However, WebHDFS doesn't seem to have this batching logic.
> {code}
>   @Override
>   public FileStatus[] listStatus(final Path f) throws IOException {
>     final HttpOpParam.Op op = GetOpParam.Op.LISTSTATUS;
>     return new FsPathResponseRunner<FileStatus[]>(op, f) {
>       @Override
>       FileStatus[] decodeResponse(Map<?,?> json) {
>           ....
>       }
>     }.run();
>   }
> {code}
>
> Am I missing anything? So a user can DDoS by {{hadoop fs -ls -R /}} via
> WebHDFS?
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message