Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F108F112FA for ; Thu, 24 Jul 2014 21:25:44 +0000 (UTC) Received: (qmail 95665 invoked by uid 500); 24 Jul 2014 21:25:40 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 95525 invoked by uid 500); 24 Jul 2014 21:25:40 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 95514 invoked by uid 99); 24 Jul 2014 21:25:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jul 2014 21:25:40 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of bakshian@mail.uc.edu designates 207.46.163.204 as permitted sender) Received: from [207.46.163.204] (HELO na01-bl2-obe.outbound.protection.outlook.com) (207.46.163.204) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jul 2014 21:25:35 +0000 Received: from [128.146.3.171] (128.146.3.171) by CO1PR01MB080.prod.exchangelabs.com (10.242.163.144) with Microsoft SMTP Server (TLS) id 15.0.990.7; Thu, 24 Jul 2014 21:25:10 +0000 Message-ID: <53D179B2.2060900@mail.uc.edu> Date: Thu, 24 Jul 2014 17:25:06 -0400 From: Arjun Bakshi User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: Subject: Re: Building custom block placement policy. What is srcPath? References: <53D14544.1090408@mail.uc.edu> In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [128.146.3.171] X-ClientProxiedBy: BN1PR05CA001.namprd05.prod.outlook.com (10.255.197.21) To CO1PR01MB080.prod.exchangelabs.com (10.242.163.144) X-Microsoft-Antispam: BCL:0;PCL:0;RULEID: X-Forefront-PRVS: 028256169F X-Forefront-Antispam-Report: SFV:NSPM;SFS:(6009001)(6049001)(51704005)(189002)(24454002)(199002)(51914003)(377454003)(479174003)(85306003)(74502001)(64126003)(95666004)(110136001)(106356001)(77096002)(85852003)(2351001)(46102001)(89122001)(83322001)(19580405001)(80316001)(19580395003)(76482001)(50986999)(21056001)(20776003)(64706001)(83072002)(23676002)(47776003)(83506001)(50466002)(4396001)(105586002)(107886001)(75432001)(42186005)(87976001)(33656002)(88552001)(92566001)(92726001)(87266999)(15975445006)(65816999)(54356999)(101416001)(81542001)(74662001)(76176999)(31966008)(99396002)(81342001)(65806001)(102836001)(65956001)(77982001)(66066001)(79102001)(107046002)(80022001)(86362001);DIR:OUT;SFP:;SCL:1;SRVR:CO1PR01MB080;H:[128.146.3.171];FPR:;MLV:sfv;PTR:InfoNoRecords;MX:1;LANG:en; X-OriginatorOrg: mail.uc.edu X-Virus-Checked: Checked by ClamAV on apache.org Hi, Thanks for the reply. It cleared up a few things. I hadn't thought of situations of under-replication, but I'll give it some thought now. It should be easier since, as you've mentioned, by that time the namenode knows all the blocks that came from the same file as the under-replicated block. For the most part, I was thinking of when a new file is being placed on the cluster. I think this is what you called in-progress files. Say a new 1GB file needs to be placed on to the cluster. I want to make the system take information of the file being 1GB in size into account while placing all its blocks on to nodes in a cluster. I'm not clear on where the file is broken down into blocks/chunks; in terms of which class, which file system(local or hdfs), or where in the process flow. Knowing that will help me come up with a solution. Where is the last place, in terms of a function or point in process that I can find the name of the original file that is being placed on the system? I'm reading the namenode and fsnamesystem code just to see if I can do what I want from there. Any suggestions will be appreciated. Thank you, AB On 07/24/2014 02:12 PM, Harsh J wrote: > Hello, > > (Inline) > > On Thu, Jul 24, 2014 at 11:11 PM, Arjun Bakshi wrote: >> Hi, >> >> I want to write a block placement policy that takes the size of the file >> being placed into account. Something like what is done in CoHadoop or BEEMR >> paper. I have the following questions: >> >> 1- What is srcPath in chooseTarget? Is it the path to the original >> un-chunked file, or it is a path to a single block, or something else? I >> added some code to blockplacementpolicydefault to print out the value of >> srcPath but the results look odd. > The arguments are documented in the interface javadoc: > https://github.com/apache/hadoop-common/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicy.java#L61 > > The srcPath is the file path of the file on HDFS for which the block > placement targets are being requested. > >> 2- Will a simple new File(srcPath) will do? > Please rephrase? The srcPath is not a local file if thats what you meant. > >> 3- I've spent time looking at hadoop source code. I can't find a way to go >> from srcPath in chooseTarget to a file size. Every function I think can do >> it, in FSNamesystem, FSDirectory, etc., is either non-public, or cannot be >> called from inside the blockmanagement package or blockplacement class. > The block placement is something that, within a context of a new file > creation, is called when requesting a new block. At this point the > file is not complete, so there is no way to determine its actual > length, but only the requested block size. I'm not certain if > BlockPlacementPolicy is what will solve your goal. > >> How do I go from srcPath in blockplacement class to size of the file being >> placed? > Are you targeting in-progress files or completed files? The latter > form of files would result in placement policy calls iff there's an > under-replication/losses/etc. to block replicas of the original set. > Only for such operations would you have a possibility to determine the > actual full length of file (as explained above). > >> Thank you, >> >> AB > >