hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Building custom block placement policy. What is srcPath?
Date Thu, 24 Jul 2014 18:12:44 GMT
Hello,

(Inline)

On Thu, Jul 24, 2014 at 11:11 PM, Arjun Bakshi <bakshian@mail.uc.edu> wrote:
> Hi,
>
> I want to write a block placement policy that takes the size of the file
> being placed into account. Something like what is done in CoHadoop or BEEMR
> paper. I have the following questions:
>
> 1- What is srcPath in chooseTarget? Is it the path to the original
> un-chunked file, or it is a path to a single block, or something else? I
> added some code to blockplacementpolicydefault to print out the value of
> srcPath but the results look odd.

The arguments are documented in the interface javadoc:
https://github.com/apache/hadoop-common/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicy.java#L61

The srcPath is the file path of the file on HDFS for which the block
placement targets are being requested.

> 2- Will a simple new File(srcPath) will do?

Please rephrase? The srcPath is not a local file if thats what you meant.

> 3- I've spent time looking at hadoop source code. I can't find a way to go
> from srcPath in chooseTarget to a file size. Every function I think can do
> it, in FSNamesystem, FSDirectory, etc., is either non-public, or cannot be
> called from inside the blockmanagement package or blockplacement class.

The block placement is something that, within a context of a new file
creation, is called when requesting a new block. At this point the
file is not complete, so there is no way to determine its actual
length, but only the requested block size. I'm not certain if
BlockPlacementPolicy is what will solve your goal.

> How do I go from srcPath in blockplacement class to size of the file being
> placed?

Are you targeting in-progress files or completed files? The latter
form of files would result in placement policy calls iff there's an
under-replication/losses/etc. to block replicas of the original set.
Only for such operations would you have a possibility to determine the
actual full length of file (as explained above).

> Thank you,
>
> AB



-- 
Harsh J

Mime
View raw message