hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: datanode files list
Date Mon, 21 Apr 2008 17:31:31 GMT

This is kind of odd that you are doing this.  It really sounds like a
replication of what hadoop is doing.

Why not just run a map process and have hadoop figure out which blocks are
where?  

Can you say more about *why* you are doing this, not just what you are
trying to do?

On 4/21/08 10:28 AM, "Shimi K" <shimi.eng@gmail.com> wrote:

> I am using Hadoop HDFS as a distributed file system. On each DFS node I have
> another process which needs to read the local HDFS files.
> Right now I'm calling the NameNode in order to get the list of all the files
> in the cluster. For each file I check if it is a local file (one of the
> locations is the host of the node), if it is I read it.
> Disadvantages:
> * This solution works only if the entire file is not split.
> * It involves the NameNode.
> * Each node needs to iterate on all the files in the cluster.
> 
> There must be a better way to do it. The perfect way will be to call the
> DataNode and to get a list of the local files and their blocks.
> 
> On Mon, Apr 21, 2008 at 7:18 PM, Ted Dunning <tdunning@veoh.com> wrote:
> 
>> 
>> Datanodes don't necessarily contain complete files.  It is possible to
>> enumerate all files and to find out which datanodes host different blocks
>> from these files.
>> 
>> What did you need to do?
>> 
>> 
>> On 4/21/08 2:11 AM, "Shimi K" <shimi.eng@gmail.com> wrote:
>> 
>>> Is there a way to get the list of files on each datanode?
>>> I need to be able to get all the names of the files on a specific
>> datanode?
>>> is there a way to do it?
>> 
>> 


Mime
View raw message