Return-Path: X-Original-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3B422DE92 for ; Thu, 6 Sep 2012 13:06:47 +0000 (UTC) Received: (qmail 25730 invoked by uid 500); 6 Sep 2012 13:06:46 -0000 Delivered-To: apmail-hadoop-hdfs-dev-archive@hadoop.apache.org Received: (qmail 25465 invoked by uid 500); 6 Sep 2012 13:06:46 -0000 Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-dev@hadoop.apache.org Delivered-To: mailing list hdfs-dev@hadoop.apache.org Received: (qmail 25436 invoked by uid 99); 6 Sep 2012 13:06:46 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Sep 2012 13:06:46 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of harsh@cloudera.com designates 209.85.223.176 as permitted sender) Received: from [209.85.223.176] (HELO mail-ie0-f176.google.com) (209.85.223.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Sep 2012 13:06:41 +0000 Received: by iecs9 with SMTP id s9so3337620iec.35 for ; Thu, 06 Sep 2012 06:06:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding:x-gm-message-state; bh=YWz+fo+QFNgUQZ0BY4B5GKhrDcHtxWQ9JHPOA3I69rc=; b=Ug1owq8TDZs1EFOBwing6gl6ZcFlDSwXI+IpPCZr+S7EvQe3klVVr26XnRnCmvmCQU GVi95B6Rn5d+Ddjy9C9eX/iG082gWEdbhuQLGRncXOHftv6SBV7O1PWYWh8CMMNptSsE s15CGFeqDBj6DU+y4AKzCpke8Osr26fcO6Re1OOtELFDBkyk0qBUtmrFm7PELfdBbD9+ jsoq+EgysFR+E1ffS4W7OgK2tANUXq+M1LuYjxDEKT52cXPuGc5P5k4t6Ld/z/1vuVJK 4SKY4sC+emL050JKwmPe2jvMm6iCITeQp7sz0RbIee1674kx2CnwyAJDrshc7EaNYrGW +/Aw== Received: by 10.50.196.161 with SMTP id in1mr2807485igc.47.1346936780777; Thu, 06 Sep 2012 06:06:20 -0700 (PDT) MIME-Version: 1.0 Received: by 10.50.91.132 with HTTP; Thu, 6 Sep 2012 06:05:59 -0700 (PDT) In-Reply-To: References: From: Harsh J Date: Thu, 6 Sep 2012 18:35:59 +0530 Message-ID: Subject: Re: Transfer blocks from one datanode to another To: hdfs dev Cc: adrian.liu@sas.com Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQlYPa/mKqju1w3UxIBF9fVUKMvPj/HvRoBKukJwJIWSM3I5XWU4Lp78kSeNdd2uR3kedbcP X-Virus-Checked: Checked by ClamAV on apache.org Hi, Please do not use the general@ lists for development/usage questions. This list is meant for project-level discussions alone. Thanks! :) I've moved this mail to hdfs-dev@hadoop.apache.org. When replying, please instead use this list, going forward. My reply inline: On Thu, Sep 6, 2012 at 3:12 PM, Adrian (Xinyu) Liu wro= te: > Hi All, > > Nowadays, I am working with HDFS and implementing some functionalities ba= se on HDFS API. > As I knew, one specific file is divided into several blocks and distribut= ed into different datanode with certain replication numbers. > And I want to find out a series of HDFS API which can meet the following = requirement: > > > 1. Given a specific filename and related information that already u= ploaded into the HDFS, retrieve: how many blocks are there, > > each datanode contain which blocks, etc. This isn't possible to get if you're using simple Public APIs. The FileSystem#getFileBlockLocations will tell you what hosts are carrying the blocks of a file (a list of hosts for each block in the file). See http://hadoop.apache.org/common/docs/current/api/org/apache/hado= op/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path,%20lo= ng,%20long) For the list of block IDs, you'd have to pull from a DFSClient instance, which calls the (NameNode-side) ClientProtocol's getBlockLocations(=85) method call. See the interface at http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-pr= oject/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProto= col.java?view=3Dmarkup > 2. Given a specific filename, source datanode, specific block id, d= estination datanode and related information to transfer the block > > from source node to destination node. This needs to be done via the DataTransferProtocol, and its specific method of replaceBlock(=85). See the interface at http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-pr= oject/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfe= r/DataTransferProtocol.java?view=3Dmarkup > I've read several materials and API reference about HDFS and can't find a= ppropriate ones, the objective of this mail is to make sure > if such API existed, and if so, what are they (especially the second one= , transfer a specific block of a specific file from a certain datanode to a= nother) > > There is a tool called Balancer already existed in HDFS package, I am rea= ding the source code, but it's too intricate to track the line, can anyone = help me? In the Balancer sources, see the final replaceBlock(=85) call made at L376 at http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop= -hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balan= cer/Balancer.java?view=3Dmarkup, and then trace backwards from that point to see how its built up till that point. Feel free to send across any more questions you have! --=20 Harsh J