hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Welling <well...@psc.edu>
Subject Re: Hadoop over Lustre?
Date Sat, 23 Aug 2008 20:29:18 GMT
So far no success, Konstantin- the hadoop job seems to start up, but
fails immediately leaving no logs.  What is the appropriate setting for
mapred.job.tracker ?  The generic value references hdfs, but it also has
a port number- I'm not sure what that means.

My cluster is small, but if I get this working I'd be very happy to run
some benchmarks.  Are there standard tests of hadoop performance?

-Joel
 welling@psc.edu

On Fri, 2008-08-22 at 15:59 -0700, Konstantin Shvachko wrote:
> I think the solution should be easier than Arun and Steve advise.
> Lustre is already mounted as a local directory on each cluster machines, right?
> Say, it is mounted on /mnt/lustre.
> Then you configure hadoop-site.xml and set
> <property>
>    <name>fs.default.name</name>
>    <value>file:///mnt/lustre</value>
> </property>
> And then you start map-reduce only without hdfs using start-mapred.sh
> 
> By this you basically redirect all FileSystem requests to Lustre and you don't need
> data-nodes or the name-node.
> 
> Please let me know if that works.
> 
> Also it would very interesting to have your experience shared on this list.
> Problems, performance - everything is quite interesting.
> 
> Cheers,
> --Konstantin
> 
> Joel Welling wrote:
> >> 2. Could you set up symlinks from the local filesystem, so point every 
> >> node at a local dir
> >>   /tmp/hadoop
> >> with each node pointing to a different subdir in the big filesystem?
> > 
> > Yes, I could do that!  Do I need to do it for the log directories as
> > well, or can they be shared?
> > 
> > -Joel
> > 
> > On Fri, 2008-08-22 at 15:48 +0100, Steve Loughran wrote:
> >> Joel Welling wrote:
> >>> Thanks, Steve and Arun.  I'll definitely try to write something based on
> >>> the KFS interface.  I think that for our applications putting the mapper
> >>> on the right rack is not going to be that useful.  A lot of our
> >>> calculations are going to be disordered stuff based on 3D spatial
> >>> relationships like nearest-neighbor finding, so things will be in a
> >>> random access pattern most of the time.
> >>>
> >>> Is there a way to set up the configuration for HDFS so that different
> >>> datanodes keep their data in different directories?  That would be a big
> >>> help in the short term.
> >> yes, but you'd have to push out a different config to each datanode.
> >>
> >> 1. I have some stuff that could help there, but its not ready for 
> >> production use yet [1].
> >>
> >> 2. Could you set up symlinks from the local filesystem, so point every 
> >> node at a local dir
> >>   /tmp/hadoop
> >> with each node pointing to a different subdir in the big filesystem?
> >>
> >>
> >> [1] 
> >> http://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf
> > 
> > 


Mime
View raw message