hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: About block name and location.
Date Tue, 18 Oct 2011 05:31:29 GMT

Before you may possibly end up duplicating work done to improve
co-located client reads from DNs, I'd suggest seeing JIRAs
https://issues.apache.org/jira/browse/HDFS-2246 and

Regarding your last requirement, about getting the path to the block
files - there's no public API available for that yet. The info is
carried by the DataNode alone at the moment, and does not expose it
out directly (one instead makes a transceiver and DN does the read
work by itself).

On Tue, Oct 18, 2011 at 8:35 AM, Yuduo <yuduozhou@gmail.com> wrote:
> Thanks, Uma! I'll try to figure it out according to your direction.
> Best,
> Yuduo
> On 10/17/2011 10:51 PM, Uma Maheswara Rao G 72686 wrote:
>> ----- Original Message -----
>> From: Yuduo Zhou<yuduozhou@gmail.com>
>> Date: Tuesday, October 18, 2011 6:30 am
>> Subject: About block name and location.
>> To: hdfs-user@hadoop.apache.org
>>> Hi all,
>>> I'm a rookie to HDFS. Here is just a quick question, suppose I have
>>> a big file stored in HDFS, is there any way to generate a file
>>> containing all information about blocks belong to this file?
>>> For example list of records with format of "block_id, length,
>>> offset, hosts[], local/path/to/this/block"?
>> FileSystem#getFileStatus(Path f) will give some information. FileStatus
>> contains below parameters to get.
>> Path path;
>> long length;
>> boolean isdir;
>> short block_replication;
>> long blocksize;
>> long modification_time;
>> long access_time;
>> FsPermission permission;
>> String owner;
>> String group;
>> Path symlink;
>> And to get the blcok locations nd offsets you can use
>> FileSystem#getFileBlockLocations
>> If you want exactly in your format, i would suggest you to write small
>> wrapper in your app and format it using above APIs.
>>> The purpose is to enable programs to only access blocks on the same
>>> node, to utilize block locality.
>> Hadoop already supports it.
>>> I can retrieve most information using getFileBlockLocations() but I
>>> didn't find how to gather information about the local path.
>> AFAIK, Local files will be written as just normal file. So, hadoop will
>> not split local files into blocks. It will do that only in DFS case.
>>> Thanks,
>>> Yuduo

Harsh J

View raw message