hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stanley Shi <s...@pivotal.io>
Subject Re: Local file system to access hdfs blocks
Date Fri, 29 Aug 2014 06:34:36 GMT
*BP-13-7914115-10.122.195.197-14909166276345 is the blockpool information*
*blk_1073742025 <1073742025> is the block name;*

*these names are "private" to teh HDFS system and user should not use them,
right?*
*But if you really want ot know this, you can check the fsck code to see
whether they are available;*


On Fri, Aug 29, 2014 at 8:13 AM, Demai Ni <nidmgg@gmail.com> wrote:

> Stanley and all,
>
> thanks. I will write a client application to explore this path. A quick
> question again.
> Using the fsck command, I can retrieve all the necessary info
> $ hadoop fsck /tmp/list2.txt -files -blocks -racks
> .....
>  *BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025 <1073742025>*
> len=8 repl=2
> [/default/10.122.195.198:50010, /default/10.122.195.196:50010]
>
> However, using getFileBlockLocations(), I can't get the block name/id
> info, such as
> *BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025 <1073742025> *seem
> the BlockLocation don't provide the public info here.
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/BlockLocation.html
>
> is there another entry point? somethinig fsck is using? thanks
>
> Demai
>
>
>
>
> On Wed, Aug 27, 2014 at 11:09 PM, Stanley Shi <sshi@pivotal.io> wrote:
>
>> As far as I know, there's no combination of hadoop API can do that.
>> You can easily get the location of the block (on which DN), but there's
>> no way to get the local address of that block file.
>>
>>
>>
>> On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni <nidmgg@gmail.com> wrote:
>>
>>> Yehia,
>>>
>>> No problem at all. I really appreciate your willingness to help. Yeah.
>>> now I am able to get such information through two steps, and the first step
>>> will be either hadoop fsck or getFileBlockLocations(). and then search
>>> the local filesystem, my cluster is using the default from CDH, which is
>>> /dfs/dn
>>>
>>> I would like to it programmatically, so wondering whether someone
>>> already done it? or maybe better a hadoop API call already implemented for
>>> this exact purpose
>>>
>>> Demai
>>>
>>>
>>> On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater <y.z.elshater@gmail.com>
>>> wrote:
>>>
>>>> Hi Demai,
>>>>
>>>> Sorry, I missed that you are already tried this out. I think you can
>>>> construct the block location on the local file system if you have the block
>>>> pool id and the block id. If you are using cloudera distribution, the
>>>> default location is under /dfs/dn ( the value of dfs.data.dir,
>>>> dfs.datanode.data.dir configuration keys).
>>>>
>>>> Thanks
>>>> Yehia
>>>>
>>>>
>>>> On 27 August 2014 21:20, Yehia Elshater <y.z.elshater@gmail.com> wrote:
>>>>
>>>>> Hi Demai,
>>>>>
>>>>> You can use fsck utility like the following:
>>>>>
>>>>> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks
>>>>>
>>>>> This will display all the information you need about the blocks of
>>>>> your file.
>>>>>
>>>>> Hope it helps.
>>>>> Yehia
>>>>>
>>>>>
>>>>> On 27 August 2014 20:18, Demai Ni <nidmgg@gmail.com> wrote:
>>>>>
>>>>>> Hi, Stanley,
>>>>>>
>>>>>> Many thanks. Your method works. For now, I can have two steps
>>>>>> approach:
>>>>>> 1) getFileBlockLocations to grab hdfs BlockLocation[]
>>>>>> 2) use local file system call(like find command) to match the block
>>>>>> to files on local file system .
>>>>>>
>>>>>> Maybe there is an existing Hadoop API to return such info in already?
>>>>>>
>>>>>> Demai on the run
>>>>>>
>>>>>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <sshi@pivotal.io>
wrote:
>>>>>>
>>>>>> I am not sure this is what you want but you can try this shell
>>>>>> command:
>>>>>>
>>>>>> find [DATANODE_DIR] -name [blockname]
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <nidmgg@gmail.com>
wrote:
>>>>>>
>>>>>>> Hi, folks,
>>>>>>>
>>>>>>> New in this area. Hopefully to get a couple pointers.
>>>>>>>
>>>>>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop
2.3)
>>>>>>>
>>>>>>> I am wondering whether there is a interface to get each hdfs
block
>>>>>>> information in the term of local file system.
>>>>>>>
>>>>>>> For example, I can use "Hadoop fsck /tmp/test.txt -files -blocks
>>>>>>> -racks" to get blockID and its replica on the nodes, such as:
repl =3[
>>>>>>> /rack/hdfs01, /rack/hdfs02...]
>>>>>>>
>>>>>>>  With such info, is there a way to
>>>>>>> 1) login to hfds01, and read the block directly at local file
system
>>>>>>> level?
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Demai on the run
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> *Stanley Shi,*
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Regards,
>> *Stanley Shi,*
>>
>>
>


-- 
Regards,
*Stanley Shi,*

Mime
View raw message