Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 29375 invoked from network); 28 Mar 2008 15:54:03 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 28 Mar 2008 15:54:03 -0000 Received: (qmail 12121 invoked by uid 500); 28 Mar 2008 15:53:56 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 12066 invoked by uid 500); 28 Mar 2008 15:53:56 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 12048 invoked by uid 99); 28 Mar 2008 15:53:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Mar 2008 08:53:56 -0700 X-ASF-Spam-Status: No, hits=2.8 required=10.0 tests=RCVD_IN_DNSWL_LOW,RCVD_NUMERIC_HELO,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [69.50.2.13] (HELO ex9.myhostedexchange.com) (69.50.2.13) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Mar 2008 15:53:03 +0000 Received: from 206.169.1.36 ([206.169.1.36]) by ex9.hostedexchange.local ([69.50.2.13]) with Microsoft Exchange Server HTTP-DAV ; Fri, 28 Mar 2008 15:53:21 +0000 User-Agent: Microsoft-Entourage/11.3.3.061214 Date: Fri, 28 Mar 2008 08:53:04 -0700 Subject: Re: [Map/Reduce][HDFS] From: Ted Dunning To: , core-dev Message-ID: Thread-Topic: [Map/Reduce][HDFS] Thread-Index: AciQ6864DVyDUPzfEdyKdwAWy8rVfQ== In-Reply-To: <1206718060.7087.17.camel@JPBeast> Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Try running dfs -put on each of the machines that has content. That will give you good balance and should let you write at very high speed (depending on your cluster size). On 3/28/08 8:27 AM, "Jean-Pierre" wrote: > Hello > > I'm not sure I've understood...actually I've already set this field in > the configuration file. I think this field is just to specify the master > for the HDFS. > > My problem is that I have many machines with, on each one, a bunch of > files which represent the distributed data ... and I want to use this > distribution of data with hadoop. Maybe there is another configuration > file which allow me to say to hadoop how to use my file distribution. > Is it possible ? Should I look to adapt my distribution of data to the > hadoop one ? > > Anyway thanks for your answer Peeyush. > > On Fri, 2008-03-28 at 16:22 +0530, Peeyush Bishnoi wrote: >> hello , >> >> Yes you can do this by specify in hadoop-site.xml about the location of >> namenode , where your data is already get distributed. >> >> --------------------------------------------------------------- >> >> fs.default.name >> >> >> >> --------------------------------------------------------------- >> >> Thanks >> >> --- >> Peeyush >> >> >> On Thu, 2008-03-27 at 15:41 -0400, Jean-Pierre wrote: >> >>> Hello, >>> >>> I'm working on large amount of logs, and I've noticed that the >>> distribution of data on the network (./hadoop dfs -put input input) >>> takes a lot of time. >>> >>> Let's says that my data is already distributed among the network, is >>> there anyway to say to hadoop to use the already existing >>> distribution ?. >>> >>> Thanks >>> > >