Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of welling@psc.edu designates
 128.182.58.100 as permitted sender)
Subject: Re: Hadoop over Lustre?
From: Joel Welling <welling@psc.edu>
Reply-To: welling@psc.edu
To: core-user@hadoop.apache.org
Cc: welling@psc.edu
In-Reply-To: <48B32320.9020905@yahoo-inc.com>
References: <1219337961.3305.10.camel@localhost.localdomain>
	 <48AE8B6E.7050806@apache.org>
	 <1219415181.4556.7.camel@localhost.localdomain>
	 <48AED1D7.2010702@apache.org>
	 <1219422990.3211.12.camel@localhost.localdomain>
	 <48AF44BC.5090803@yahoo-inc.com>
	 <1219523358.3545.26.camel@localhost.localdomain>
	 <48B32320.9020905@yahoo-inc.com>
Content-Type: text/plain
Organization: Pittsburgh Supercomputing Center
Date: Fri, 29 Aug 2008 13:23:13 -0400
Message-Id: <1220030593.3250.20.camel@localhost.localdomain>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit

Sorry; I'm picking this thread up after a couple day's delay.  Setting
fs.default.name to the equivalent of file:///path/to/lustre and changing
mapred.job.tracker to just a hostname and port does allow mapreduce to
start up.  However, test jobs fail with the exceptions below.  It looks
like TaskTracker.localizeJob is looking for job.xml in the local
filesystem; I would have expected it to look in lustre.

I can't find that particular job.xml anywhere on the system after the
run aborts, I'm afraid.  I guess it's getting cleaned up.

Thanks,
-Joel

08/08/28 18:46:07 INFO mapred.FileInputFormat: Total input paths to
process : 1508/08/28 18:46:07 INFO mapred.FileInputFormat: Total input
paths to process : 1508/08/28 18:46:08 INFO mapred.JobClient: Running
job: job_200808281828_0002
08/08/28 18:46:09 INFO mapred.JobClient:  map 0% reduce 0%
08/08/28 18:46:12 INFO mapred.JobClient: Task Id :
attempt_200808281828_0002_m_000000_0, Status : FAILED
Error initializing attempt_200808281828_0002_m_000000_0:
java.io.IOException:
file:/tmp/hadoop-welling/mapred/system/job_200808281828_0002/job.xml: No
such file or directory
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:216)
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:150)
        at
org.apache.hadoop.fs.LocalFileSystem.copyToLocalFile(LocalFileSystem.java:55)
        at
org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1193)
at
org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:668)
        at
org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1306)
        at
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:946)
        at
org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1343)
        at
org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2354)

08/08/28 18:46:12 WARN mapred.JobClient: Error reading task
outputhttp://foo.psc.edu:50060/tasklog?plaintext=true&taskid=attempt_200808281828_0002_m_000000_0&filter=stdout
08/08/28 18:46:12 WARN mapred.JobClient: Error reading task
outputhttp://foo.psc.edu:50060/tasklog?plaintext=true&taskid=attempt_200808281828_0002_m_000000_0&filter=stderr


On Mon, 2008-08-25 at 14:24 -0700, Konstantin Shvachko wrote:
> mapred.job.tracker is the address and port of the JobTracker - the main server that controls map-reduce jobs.
> Every task tracker needs to know the address in order to connect.
> Do you follow the docs, e.g. that one
> http://wiki.apache.org/hadoop/GettingStartedWithHadoop
> 
> Can you start one node cluster?
> 
>  > Are there standard tests of hadoop performance?
> 
> There is the sort benchmark. We also run DFSIO benchmark for read and write throughputs.
> 
> --Konstantin
> 
> Joel Welling wrote:
> > So far no success, Konstantin- the hadoop job seems to start up, but
> > fails immediately leaving no logs.  What is the appropriate setting for
> > mapred.job.tracker ?  The generic value references hdfs, but it also has
> > a port number- I'm not sure what that means.
> > 
> > My cluster is small, but if I get this working I'd be very happy to run
> > some benchmarks.  Are there standard tests of hadoop performance?
> > 
> > -Joel
> >  welling@psc.edu
> > 
> > On Fri, 2008-08-22 at 15:59 -0700, Konstantin Shvachko wrote:
> >> I think the solution should be easier than Arun and Steve advise.
> >> Lustre is already mounted as a local directory on each cluster machines, right?
> >> Say, it is mounted on /mnt/lustre.
> >> Then you configure hadoop-site.xml and set
> >> <property>
> >>    <name>fs.default.name</name>
> >>    <value>file:///mnt/lustre</value>
> >> </property>
> >> And then you start map-reduce only without hdfs using start-mapred.sh
> >>
> >> By this you basically redirect all FileSystem requests to Lustre and you don't need
> >> data-nodes or the name-node.
> >>
> >> Please let me know if that works.
> >>
> >> Also it would very interesting to have your experience shared on this list.
> >> Problems, performance - everything is quite interesting.
> >>
> >> Cheers,
> >> --Konstantin
> >>
> >> Joel Welling wrote:
> >>>> 2. Could you set up symlinks from the local filesystem, so point every 
> >>>> node at a local dir
> >>>>   /tmp/hadoop
> >>>> with each node pointing to a different subdir in the big filesystem?
> >>> Yes, I could do that!  Do I need to do it for the log directories as
> >>> well, or can they be shared?
> >>>
> >>> -Joel
> >>>
> >>> On Fri, 2008-08-22 at 15:48 +0100, Steve Loughran wrote:
> >>>> Joel Welling wrote:
> >>>>> Thanks, Steve and Arun.  I'll definitely try to write something based on
> >>>>> the KFS interface.  I think that for our applications putting the mapper
> >>>>> on the right rack is not going to be that useful.  A lot of our
> >>>>> calculations are going to be disordered stuff based on 3D spatial
> >>>>> relationships like nearest-neighbor finding, so things will be in a
> >>>>> random access pattern most of the time.
> >>>>>
> >>>>> Is there a way to set up the configuration for HDFS so that different
> >>>>> datanodes keep their data in different directories?  That would be a big
> >>>>> help in the short term.
> >>>> yes, but you'd have to push out a different config to each datanode.
> >>>>
> >>>> 1. I have some stuff that could help there, but its not ready for 
> >>>> production use yet [1].
> >>>>
> >>>> 2. Could you set up symlinks from the local filesystem, so point every 
> >>>> node at a local dir
> >>>>   /tmp/hadoop
> >>>> with each node pointing to a different subdir in the big filesystem?
> >>>>
> >>>>
> >>>> [1] 
> >>>> http://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf
> >>>
> > 
> >