Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-user@hadoop.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
User-Agent: Microsoft-Entourage/11.3.3.061214
Date: Mon, 21 Apr 2008 10:31:31 -0700
Subject: Re: datanode files list
From: Ted Dunning <tdunning@veoh.com>
To: <core-user@hadoop.apache.org>
Message-ID: <C4321D83.3CEAD%tdunning@veoh.com>
Thread-Topic: datanode files list
Thread-Index: Acij1Yl7yAY8NA/IEd2KhQAWy8rVfQ==
In-Reply-To: <d9af55310804211028k3197001cr1204f90fd025048d@mail.gmail.com>
Mime-version: 1.0
Content-type: text/plain;
	charset="US-ASCII"
Content-transfer-encoding: 7bit


This is kind of odd that you are doing this.  It really sounds like a
replication of what hadoop is doing.

Why not just run a map process and have hadoop figure out which blocks are
where?  

Can you say more about *why* you are doing this, not just what you are
trying to do?

On 4/21/08 10:28 AM, "Shimi K" <shimi.eng@gmail.com> wrote:

> I am using Hadoop HDFS as a distributed file system. On each DFS node I have
> another process which needs to read the local HDFS files.
> Right now I'm calling the NameNode in order to get the list of all the files
> in the cluster. For each file I check if it is a local file (one of the
> locations is the host of the node), if it is I read it.
> Disadvantages:
> * This solution works only if the entire file is not split.
> * It involves the NameNode.
> * Each node needs to iterate on all the files in the cluster.
> 
> There must be a better way to do it. The perfect way will be to call the
> DataNode and to get a list of the local files and their blocks.
> 
> On Mon, Apr 21, 2008 at 7:18 PM, Ted Dunning <tdunning@veoh.com> wrote:
> 
>> 
>> Datanodes don't necessarily contain complete files.  It is possible to
>> enumerate all files and to find out which datanodes host different blocks
>> from these files.
>> 
>> What did you need to do?
>> 
>> 
>> On 4/21/08 2:11 AM, "Shimi K" <shimi.eng@gmail.com> wrote:
>> 
>>> Is there a way to get the list of files on each datanode?
>>> I need to be able to get all the names of the files on a specific
>> datanode?
>>> is there a way to do it?
>> 
>>