hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Bach <thb...@students.uni-mainz.de>
Subject Running Hadoop in User-Space on LSF
Date Fri, 03 Aug 2012 10:43:34 GMT
Hi list,

I'm currently evaluating different scenarios to use Hadoop. I have
access to a Linux cluster running LSF as batch system. I have the idea
to write a small wrapper in Python which

+ generates a Hadoop configuration on a per Job basis
+ formats a per job HDFS
+ brings up the NameNode and the JobTracker
+ copies all necessary files to HDFS
+ launches the actual Map/Reduce instances
+ when the job is finished, copies the produced files from HDFS
+ shuts down the daemons

My questions are:
1) Has someone already put some effort in a project similar to this?
2) Do you estimate the over-head of Hadoop set-up to big to get an
actual performance gain?

I assume (2) to depend on job running time and how big the input data
is. Thus,
3) What do you think are the characteristics of a job to gain
performance improvements?


View raw message