Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 70150 invoked from network); 29 Aug 2008 17:25:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 29 Aug 2008 17:25:08 -0000 Received: (qmail 89322 invoked by uid 500); 29 Aug 2008 17:25:01 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 89290 invoked by uid 500); 29 Aug 2008 17:25:01 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 89279 invoked by uid 99); 29 Aug 2008 17:25:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Aug 2008 10:25:01 -0700 X-ASF-Spam-Status: No, hits=1.5 required=10.0 tests=SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of welling@psc.edu designates 128.182.58.100 as permitted sender) Received: from [128.182.58.100] (HELO mailer1.psc.edu) (128.182.58.100) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Aug 2008 17:24:00 +0000 Received: from PSC-WELLING.WV.CC.CMU.EDU (PSC-WELLING.WV.CC.CMU.EDU [128.237.252.47]) (authenticated bits=0) by mailer1.psc.edu (8.14.2/8.13.3) with ESMTP id m7THNCGv015934 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Fri, 29 Aug 2008 13:23:14 -0400 (EDT) Subject: Re: Hadoop over Lustre? From: Joel Welling Reply-To: welling@psc.edu To: core-user@hadoop.apache.org Cc: welling@psc.edu In-Reply-To: <48B32320.9020905@yahoo-inc.com> References: <1219337961.3305.10.camel@localhost.localdomain> <48AE8B6E.7050806@apache.org> <1219415181.4556.7.camel@localhost.localdomain> <48AED1D7.2010702@apache.org> <1219422990.3211.12.camel@localhost.localdomain> <48AF44BC.5090803@yahoo-inc.com> <1219523358.3545.26.camel@localhost.localdomain> <48B32320.9020905@yahoo-inc.com> Content-Type: text/plain Organization: Pittsburgh Supercomputing Center Date: Fri, 29 Aug 2008 13:23:13 -0400 Message-Id: <1220030593.3250.20.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Sorry; I'm picking this thread up after a couple day's delay. Setting fs.default.name to the equivalent of file:///path/to/lustre and changing mapred.job.tracker to just a hostname and port does allow mapreduce to start up. However, test jobs fail with the exceptions below. It looks like TaskTracker.localizeJob is looking for job.xml in the local filesystem; I would have expected it to look in lustre. I can't find that particular job.xml anywhere on the system after the run aborts, I'm afraid. I guess it's getting cleaned up. Thanks, -Joel 08/08/28 18:46:07 INFO mapred.FileInputFormat: Total input paths to process : 1508/08/28 18:46:07 INFO mapred.FileInputFormat: Total input paths to process : 1508/08/28 18:46:08 INFO mapred.JobClient: Running job: job_200808281828_0002 08/08/28 18:46:09 INFO mapred.JobClient: map 0% reduce 0% 08/08/28 18:46:12 INFO mapred.JobClient: Task Id : attempt_200808281828_0002_m_000000_0, Status : FAILED Error initializing attempt_200808281828_0002_m_000000_0: java.io.IOException: file:/tmp/hadoop-welling/mapred/system/job_200808281828_0002/job.xml: No such file or directory at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:216) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:150) at org.apache.hadoop.fs.LocalFileSystem.copyToLocalFile(LocalFileSystem.java:55) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1193) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:668) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1306) at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:946) at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1343) at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2354) 08/08/28 18:46:12 WARN mapred.JobClient: Error reading task outputhttp://foo.psc.edu:50060/tasklog?plaintext=true&taskid=attempt_200808281828_0002_m_000000_0&filter=stdout 08/08/28 18:46:12 WARN mapred.JobClient: Error reading task outputhttp://foo.psc.edu:50060/tasklog?plaintext=true&taskid=attempt_200808281828_0002_m_000000_0&filter=stderr On Mon, 2008-08-25 at 14:24 -0700, Konstantin Shvachko wrote: > mapred.job.tracker is the address and port of the JobTracker - the main server that controls map-reduce jobs. > Every task tracker needs to know the address in order to connect. > Do you follow the docs, e.g. that one > http://wiki.apache.org/hadoop/GettingStartedWithHadoop > > Can you start one node cluster? > > > Are there standard tests of hadoop performance? > > There is the sort benchmark. We also run DFSIO benchmark for read and write throughputs. > > --Konstantin > > Joel Welling wrote: > > So far no success, Konstantin- the hadoop job seems to start up, but > > fails immediately leaving no logs. What is the appropriate setting for > > mapred.job.tracker ? The generic value references hdfs, but it also has > > a port number- I'm not sure what that means. > > > > My cluster is small, but if I get this working I'd be very happy to run > > some benchmarks. Are there standard tests of hadoop performance? > > > > -Joel > > welling@psc.edu > > > > On Fri, 2008-08-22 at 15:59 -0700, Konstantin Shvachko wrote: > >> I think the solution should be easier than Arun and Steve advise. > >> Lustre is already mounted as a local directory on each cluster machines, right? > >> Say, it is mounted on /mnt/lustre. > >> Then you configure hadoop-site.xml and set > >> > >> fs.default.name > >> file:///mnt/lustre > >> > >> And then you start map-reduce only without hdfs using start-mapred.sh > >> > >> By this you basically redirect all FileSystem requests to Lustre and you don't need > >> data-nodes or the name-node. > >> > >> Please let me know if that works. > >> > >> Also it would very interesting to have your experience shared on this list. > >> Problems, performance - everything is quite interesting. > >> > >> Cheers, > >> --Konstantin > >> > >> Joel Welling wrote: > >>>> 2. Could you set up symlinks from the local filesystem, so point every > >>>> node at a local dir > >>>> /tmp/hadoop > >>>> with each node pointing to a different subdir in the big filesystem? > >>> Yes, I could do that! Do I need to do it for the log directories as > >>> well, or can they be shared? > >>> > >>> -Joel > >>> > >>> On Fri, 2008-08-22 at 15:48 +0100, Steve Loughran wrote: > >>>> Joel Welling wrote: > >>>>> Thanks, Steve and Arun. I'll definitely try to write something based on > >>>>> the KFS interface. I think that for our applications putting the mapper > >>>>> on the right rack is not going to be that useful. A lot of our > >>>>> calculations are going to be disordered stuff based on 3D spatial > >>>>> relationships like nearest-neighbor finding, so things will be in a > >>>>> random access pattern most of the time. > >>>>> > >>>>> Is there a way to set up the configuration for HDFS so that different > >>>>> datanodes keep their data in different directories? That would be a big > >>>>> help in the short term. > >>>> yes, but you'd have to push out a different config to each datanode. > >>>> > >>>> 1. I have some stuff that could help there, but its not ready for > >>>> production use yet [1]. > >>>> > >>>> 2. Could you set up symlinks from the local filesystem, so point every > >>>> node at a local dir > >>>> /tmp/hadoop > >>>> with each node pointing to a different subdir in the big filesystem? > >>>> > >>>> > >>>> [1] > >>>> http://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf > >>> > > > >