hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Demai Ni <nid...@gmail.com>
Subject [HDFS] result order of getFileBlockLocations() and listFiles()?
Date Wed, 29 Oct 2014 20:55:47 GMT
hi, Guys,

I am trying to implement a simple program(that is not for production,
experimental). And invoke FileSystem.listFiles() to get a list of files
under a hdfs folder, and then use FileSystem.getFileBlockLocations() to get
replica locations of each file/blocks.

Since it is a controlled environment, I can make sure the files are static
and don't worry about datanode crash, fail-over, etc.

Assuming at a small time-window(say, 1 minute), I have 100~1000s client
invoke the same program to look up the same folder. Will the above two APIs
guarantee *same result in the same order* for all clients?

To elaborate a bit more, say there is a folder called /dfs/dn/user/data
contains three files: file1, file2, and file3.  If client1 gets:
listFiles() : file1,file2,file3
getFileBlockLocation(file1) -> datanode1, datanode3, datanode6

Will all other clients get the same information(I think so) and in the same
order?  or I have to do a sort by each client to guarantee the order?

Many thanks for your inputs


View raw message