hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <ar...@yahoo-inc.com>
Subject Re: Adding new filesystem to Hadoop causing too many Map tasks
Date Tue, 05 Jun 2007 05:58:30 GMT
Esteban Molina-Estolano wrote:
> Thanks for the advice. I am using an old version.
> I'm trying to upgrade to 0.12.3, but when I try to compile (even  
> without adding in my own code) I get:
> 
> [eestolan@issdm-1 ~/hadoop-0.12.3]$ ant
> Buildfile: build.xml
> 
> init:
> 
> BUILD FAILED
> /cse/grads/eestolan/hadoop-0.12.3/build.xml:114: Specify at least one  
> source--a file or resource collection.
> 

I'm guessing the non-existence of the *.template files (e.g. 
hadoop-site.xml.template) in the *conf* directory of the releases is 
causing this... I'm not sure you can just d/w the release and compile it...

I'd suggest you checkout the 0.12 branch from here:
http://svn.apache.org/repos/asf/lucene/hadoop/branches/branch-0.12/

hth,
Arun

> Total time: 0 seconds
> 
> That line in build.xml has the following:
> 
>     <touch datetime="01/25/1971 2:00 pm">
>       <fileset dir="${conf.dir}" includes="**/*.template"/>
>       <fileset dir="${contrib.dir}" includes="**/*.template"/>
>     </touch>
> 
> What might be causing the error?
> 
> Thanks,
>     ~ Esteban
> 
> 
> On Jun 1, 2007, at 9:26 AM, Owen O'Malley wrote:
> 
>>
>> On Jun 1, 2007, at 1:14 AM, Esteban Molina-Estolano wrote:
>>
>>> I'm having trouble with a small test: RandomWriter, 4 TaskTracker  
>>> nodes, 5 maps per node, 10 MB per map, for a total of 200 MB over  20 
>>> Map tasks. I tried it on Hadoop with DFS, and it took about 30  
>>> seconds. Then, I ran the same test using Ceph. I changed  
>>> fs.default.name to "ceph:///"; added fs.ceph.impl as  
>>> org.apache.hadoop.fs.ceph.CephFileSystem; and left all other  
>>> configuration settings untouched. It ran horrifically slowly.
>>>
>>> Then the JobTracker spawned 400 Map tasks:
>>>
>>> I'm ending up with way too many Map tasks, and as a result the job  
>>> takes way too long to run.
>>
>>
>> That is really strange, especially because RandomWriter isn't  looking 
>> at any real inputs. (Unless you are using version 0.11 or  earlier of 
>> Hadoop...)  Are you using an old version of Hadoop? If  so, I'd 
>> suspect it has something to do with the blocksize for the  input files 
>> being too small (likely 1 byte or so). You need to  return much bigger 
>> numbers for FileSystem.getBlockSize(Path) or map/ reduce will default 
>> to making very small input splits.
>>
>> -- Owen
> 
> 


Mime
View raw message