hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@hortonworks.com>
Subject Re: DistributedFileSystem.listStatus() - Why does it do partial listings then assemble?
Date Fri, 03 May 2013 17:14:25 GMT
On 2 May 2013 09:28, Todd Lipcon <todd@cloudera.com> wrote:

> Hi Brad,
> The reasoning is that the NameNode locking is somewhat coarse grained. In
> older versions of Hadoop, before it worked this way, we found that listing
> large directories (eg with 100k+ files) could end up holding the namenode's
> lock for a quite long period of time and starve other clients.
> Additionally, I believe there is a second API that does the "on-demand"
> fetching of the next set of files from the listing as well, no?

HDFS v2; only incompatible change between v1 and v2 FileSystem class.

Chatty over long haul and hangs Amazon S3://  an issue for which there's a
patch to
replicate but not fix the problem

Good local -but I think it needs test coverage for all the other filesystem
clients that ship w/ Hadoop

FWIW, blobstores do tend to only support paged lists of their blobs, so the
same build-up-as-you-go-along process works there. We should spell out in
the documentation "changes that occur to the filesystem during the
generation of this list MAY not be reflected in the result, and so MAY
result in a partially incomplete or inconsistent view".


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message