hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Demai Ni <nid...@gmail.com>
Subject Re: Local file system to access hdfs blocks
Date Sat, 30 Aug 2014 01:25:24 GMT
Stanley, 

Thanks. 

Btw, I found this jira hdfs-2246, which probably match what I am looking for.  

Demai on the run

On Aug 28, 2014, at 11:34 PM, Stanley Shi <sshi@pivotal.io> wrote:

> BP-13-7914115-10.122.195.197-14909166276345 is the blockpool information
> blk_1073742025 is the block name;
> 
> these names are "private" to teh HDFS system and user should not use them, right?
> But if you really want ot know this, you can check the fsck code to see whether they
are available;
> 
> 
> On Fri, Aug 29, 2014 at 8:13 AM, Demai Ni <nidmgg@gmail.com> wrote:
>> Stanley and all,
>> 
>> thanks. I will write a client application to explore this path. A quick question
again. 
>> Using the fsck command, I can retrieve all the necessary info
>> $ hadoop fsck /tmp/list2.txt -files -blocks -racks
>> .....
>>  BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025 len=8 repl=2
>> [/default/10.122.195.198:50010, /default/10.122.195.196:50010]
>> 
>> However, using getFileBlockLocations(), I can't get the block name/id info, such
as  BP-13-7914115-10.122.195.197-14909166276345:blk_1073742025
>> seem the BlockLocation don't provide the public info here. http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/BlockLocation.html
>> 
>> is there another entry point? somethinig fsck is using? thanks 
>> 
>> Demai
>> 
>> 
>> 
>> 
>> On Wed, Aug 27, 2014 at 11:09 PM, Stanley Shi <sshi@pivotal.io> wrote:
>>> As far as I know, there's no combination of hadoop API can do that.
>>> You can easily get the location of the block (on which DN), but there's no way
to get the local address of that block file.
>>> 
>>> 
>>> 
>>> On Thu, Aug 28, 2014 at 11:54 AM, Demai Ni <nidmgg@gmail.com> wrote:
>>>> Yehia,
>>>> 
>>>> No problem at all. I really appreciate your willingness to help. Yeah. now
I am able to get such information through two steps, and the first step will be either hadoop
fsck or getFileBlockLocations(). and then search the local filesystem, my cluster is using
the default from CDH, which is /dfs/dn
>>>> 
>>>> I would like to it programmatically, so wondering whether someone already
done it? or maybe better a hadoop API call already implemented for this exact purpose
>>>> 
>>>> Demai
>>>> 
>>>> 
>>>> On Wed, Aug 27, 2014 at 7:58 PM, Yehia Elshater <y.z.elshater@gmail.com>
wrote:
>>>>> Hi Demai,
>>>>> 
>>>>> Sorry, I missed that you are already tried this out. I think you can
construct the block location on the local file system if you have the block pool id and the
block id. If you are using cloudera distribution, the default location is under /dfs/dn (
the value of dfs.data.dir, dfs.datanode.data.dir configuration keys).
>>>>> 
>>>>> Thanks
>>>>> Yehia 
>>>>> 
>>>>> 
>>>>> On 27 August 2014 21:20, Yehia Elshater <y.z.elshater@gmail.com>
wrote:
>>>>>> Hi Demai,
>>>>>> 
>>>>>> You can use fsck utility like the following:
>>>>>> 
>>>>>> hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks
>>>>>> 
>>>>>> This will display all the information you need about the blocks of
your file.
>>>>>> 
>>>>>> Hope it helps.
>>>>>> Yehia
>>>>>> 
>>>>>> 
>>>>>> On 27 August 2014 20:18, Demai Ni <nidmgg@gmail.com> wrote:
>>>>>>> Hi, Stanley,
>>>>>>> 
>>>>>>> Many thanks. Your method works. For now, I can have two steps
approach:
>>>>>>> 1) getFileBlockLocations to grab hdfs BlockLocation[]
>>>>>>> 2) use local file system call(like find command) to match the
block to files on local file system .
>>>>>>> 
>>>>>>> Maybe there is an existing Hadoop API to return such info in
already?
>>>>>>> 
>>>>>>> Demai on the run
>>>>>>> 
>>>>>>> On Aug 26, 2014, at 9:14 PM, Stanley Shi <sshi@pivotal.io>
wrote:
>>>>>>> 
>>>>>>>> I am not sure this is what you want but you can try this
shell command:
>>>>>>>> 
>>>>>>>> find [DATANODE_DIR] -name [blockname]
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Tue, Aug 26, 2014 at 6:42 AM, Demai Ni <nidmgg@gmail.com>
wrote:
>>>>>>>>> Hi, folks,
>>>>>>>>> 
>>>>>>>>> New in this area. Hopefully to get a couple pointers.
>>>>>>>>> 
>>>>>>>>> I am using Centos and have Hadoop set up using cdh5.1(Hadoop
2.3)
>>>>>>>>> 
>>>>>>>>> I am wondering whether there is a interface to get each
hdfs block information in the term of local file system.
>>>>>>>>> 
>>>>>>>>> For example, I can use "Hadoop fsck /tmp/test.txt -files
-blocks -racks" to get blockID and its replica on the nodes, such as: repl =3[ /rack/hdfs01,
/rack/hdfs02...]
>>>>>>>>> 
>>>>>>>>>  With such info, is there a way to
>>>>>>>>> 1) login to hfds01, and read the block directly at local
file system level?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> 
>>>>>>>>> Demai on the run
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> Regards,
>>>>>>>> Stanley Shi,
>>>>>>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Regards,
>>> Stanley Shi,
>>> 
> 
> 
> 
> -- 
> Regards,
> Stanley Shi,
> 

Mime
View raw message