Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 485DA11981 for ; Thu, 28 Aug 2014 02:58:33 +0000 (UTC) Received: (qmail 39151 invoked by uid 500); 28 Aug 2014 02:58:27 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 39041 invoked by uid 500); 28 Aug 2014 02:58:27 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 39031 invoked by uid 99); 28 Aug 2014 02:58:27 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Aug 2014 02:58:27 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of y.z.elshater@gmail.com designates 209.85.217.177 as permitted sender) Received: from [209.85.217.177] (HELO mail-lb0-f177.google.com) (209.85.217.177) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Aug 2014 02:58:01 +0000 Received: by mail-lb0-f177.google.com with SMTP id z11so172107lbi.36 for ; Wed, 27 Aug 2014 19:58:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=CqKGpqTseZWWhZW/j9mrqniHyZXL3mv3M0EsNDezWAM=; b=uuLFs9sE66rIeCyvLT+TjH6NMoRev/JqsNyk5QfrsPyEWg7J9QcpW950RGdrYh2sJd Ij0ytkc9+ElDtacSJibTMqXf4dvZzHWEvjppiYAwX0hbdYN7ooOHjfqxW/CuGOwp08eA 9YmRCmVppaZQ8hVul9DM9Hl/HR4/weI18u1ME0PKWvQKkRAFwxA8t2lW1kKOuhuA8uYV FZ6hcrnZBmGFS/YdyRmS/myiYGNVErdp5KwDtLoPfy7zlG8Q7R8TaGJ+ub0f9AVkatYZ BBHtgOpbiHM/mdGZGCiWTkGyfVV0/uMmHwAdngC7cgfXQmgXyMxEoniuoP/Eps0yhLOC qdrg== MIME-Version: 1.0 X-Received: by 10.112.4.70 with SMTP id i6mr1034226lbi.54.1409194680685; Wed, 27 Aug 2014 19:58:00 -0700 (PDT) Received: by 10.112.1.7 with HTTP; Wed, 27 Aug 2014 19:58:00 -0700 (PDT) In-Reply-To: References: <303B76A0-A1E2-4A58-9D0D-5C5E92DA51F0@gmail.com> <42A11F5D-C2C0-40C5-9DA0-715F8E51E385@gmail.com> Date: Wed, 27 Aug 2014 22:58:00 -0400 Message-ID: Subject: Re: Local file system to access hdfs blocks From: Yehia Elshater To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=14dae94ed6cd5c98d30501a7b4cc X-Virus-Checked: Checked by ClamAV on apache.org --14dae94ed6cd5c98d30501a7b4cc Content-Type: text/plain; charset=UTF-8 Hi Demai, Sorry, I missed that you are already tried this out. I think you can construct the block location on the local file system if you have the block pool id and the block id. If you are using cloudera distribution, the default location is under /dfs/dn ( the value of dfs.data.dir, dfs.datanode.data.dir configuration keys). Thanks Yehia On 27 August 2014 21:20, Yehia Elshater wrote: > Hi Demai, > > You can use fsck utility like the following: > > hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks > > This will display all the information you need about the blocks of your > file. > > Hope it helps. > Yehia > > > On 27 August 2014 20:18, Demai Ni wrote: > >> Hi, Stanley, >> >> Many thanks. Your method works. For now, I can have two steps approach: >> 1) getFileBlockLocations to grab hdfs BlockLocation[] >> 2) use local file system call(like find command) to match the block to >> files on local file system . >> >> Maybe there is an existing Hadoop API to return such info in already? >> >> Demai on the run >> >> On Aug 26, 2014, at 9:14 PM, Stanley Shi wrote: >> >> I am not sure this is what you want but you can try this shell command: >> >> find [DATANODE_DIR] -name [blockname] >> >> >> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni wrote: >> >>> Hi, folks, >>> >>> New in this area. Hopefully to get a couple pointers. >>> >>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3) >>> >>> I am wondering whether there is a interface to get each hdfs block >>> information in the term of local file system. >>> >>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks -racks" >>> to get blockID and its replica on the nodes, such as: repl =3[ >>> /rack/hdfs01, /rack/hdfs02...] >>> >>> With such info, is there a way to >>> 1) login to hfds01, and read the block directly at local file system >>> level? >>> >>> >>> Thanks >>> >>> Demai on the run >> >> >> >> >> -- >> Regards, >> *Stanley Shi,* >> >> > --14dae94ed6cd5c98d30501a7b4cc Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Demai,

Sorry, I missed that you are = already tried this out. I think you can construct the block location on the= local file system if you have the block pool id and the block id. If you a= re using cloudera distribution, the default location is under /dfs/dn ( the= value of=C2=A0dfs= .data.dir, dfs.datanode.data.dir configuration keys).

Thanks
Yehia=C2=A0


On 27 August 2014 21:20, Yeh= ia Elshater <y.z.elshater@gmail.com> wrote:
Hi Demai,

You can use fsck utility like the following:

had= oop fsck /path/to/your/hdfs/file -files -blocks -locations -racks

This will display all the information you need ab= out the blocks of your file.

Hope it helps.
Yehia


On 27 August 2014 20:18, Demai Ni <nidmgg@gmail.com> wrote:<= br>
Hi, Stanley,

Many thanks. Your method works. For now, I can have two st= eps approach:
1) getFileBlockLocations to grab hdfs BlockLocation[]
2) use= local file system call(like find command) to match the block to files on l= ocal file system .

Maybe there is an existing Hado= op API to return such info in already?

Demai on the run

On Aug 26, 2014, at 9:14 = PM, Stanley Shi <ss= hi@pivotal.io> wrote:

I am not sure this is what you want but you can try t= his shell command:

find [DATANODE_DIR] -name [blockname]=


O= n Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <nidmgg@gmail.com> w= rote:
Hi, folks,

New in this area. Hopefully to get a couple pointers.

I am using Centos and have Hadoop set up using cdh5.1(Hadoop 2.3)

I am wondering whether there is a interface to get each hdfs block informat= ion in the term of local file system.

For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks -rack= s" to get blockID and its replica on the nodes, such as: repl =3D3[ /r= ack/hdfs01, /rack/hdfs02...]

=C2=A0With such info, is there a way to
1) login to hfds01, and read the block directly at local file system level?=


Thanks

Demai on the run



-- =
Regards,
Stanley Shi,



--14dae94ed6cd5c98d30501a7b4cc--