Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-user@hadoop.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns;
	h=from:to:references:subject:date:message-id:mime-version:
	content-type:content-transfer-encoding:x-mailer:thread-index:in-reply-to:x-mimeole;
	b=lsxEMm4V0rzzZH5vjaBzqWNOVVkNfJaiFtY4ou5gI7tHbrHwuWxG7j4DwkgLtb7Q
From: "Devaraj Das" <ddas@yahoo-inc.com>
To: <core-dev@hadoop.apache.org>, <core-user@hadoop.apache.org>
References: <1206646919.32242.0.camel@JPBeast>
 <1206701524.3094.17.camel@latewhatgrow-lx.eglbp.corp.yahoo.com>
 <1206718060.7087.17.camel@JPBeast>
Subject: RE: [Map/Reduce][HDFS]
Date: Fri, 28 Mar 2008 21:08:36 +0530
Message-ID: <00bd01c890e9$cd2e6a50$eb44420a@ds.corp.yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="US-ASCII"
Content-Transfer-Encoding: 7bit
Thread-Index: AciQ6J9UXHxE+i3hTqONL5F2Jg/0EgAALRZA
In-Reply-To: <1206718060.7087.17.camel@JPBeast>

Hi Jean, no that is not directly possible. You have to pass your data
through the DFS client in order for that to be part of the dfs (e.g. hadoop
fs -put .., etc. or programatically). 
(removing core-dev from this thread since this is really a core-user
question)

> -----Original Message-----
> From: Jean-Pierre [mailto:jean-pierre.ocalan@247realmedia.com] 
> Sent: Friday, March 28, 2008 8:58 PM
> To: core-user@hadoop.apache.org; core-dev
> Subject: Re: [Map/Reduce][HDFS]
> 
> Hello
> 
> I'm not sure I've understood...actually I've already set this 
> field in the configuration file. I think this field is just 
> to specify the master for the HDFS. 
> 
> My problem is that I have many machines with, on each one, a 
> bunch of files which represent the distributed data ... and I 
> want to use this distribution of data with hadoop. Maybe 
> there is another configuration file which allow me to say to 
> hadoop how to use my file distribution.
> Is it possible ? Should I look to adapt my distribution of 
> data to the hadoop one ?
> 
> Anyway thanks for your answer Peeyush.
> 
> On Fri, 2008-03-28 at 16:22 +0530, Peeyush Bishnoi wrote:
> > hello ,
> > 
> > Yes you can do this by specify in hadoop-site.xml about the 
> location 
> > of namenode , where your data is already get distributed.
> > 
> > ---------------------------------------------------------------
> > <property>
> >   <name>fs.default.name</name>
> >   <value> <IPAddress:PortNo>  </value> </property>
> > 
> > ---------------------------------------------------------------
> > 
> > Thanks
> > 
> > ---
> > Peeyush
> > 
> > 
> > On Thu, 2008-03-27 at 15:41 -0400, Jean-Pierre wrote:
> > 
> > > Hello,
> > > 
> > > I'm working on large amount of logs, and I've noticed that the 
> > > distribution of data on the network (./hadoop dfs -put 
> input input) 
> > > takes a lot of time.
> > > 
> > > Let's says that my data is already distributed among the 
> network, is 
> > > there anyway to say to hadoop to use the already existing 
> > > distribution ?.
> > > 
> > > Thanks
> > > 
> 
> 
>