hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Moritz Krog <moritzk...@googlemail.com>
Subject Re: Single Node with multiple mappers?
Date Fri, 16 Jul 2010 18:37:01 GMT
Hey :)

thanks for the quick response. My Systems runs on an i7 together with
about 8GB of RAM. The problem with my setup is, that I'm using Hadoop
to pump 40GB of JSON encoded data hashes into a MySQL database. The
data is in non-relational form and needs to be normalized before it
can enter the DB, thus the Hadoop approach. I ran the first batch of
test data last night, and it took ~10GB ~12h to get processed. ( a
python mapper writing to MySQL via the oursql pkg )
The reason for this is certainly not perfect configured mysqld along
with the fact that I gave Hadoop to much access to memory. I used 4GB
for Hadoop but forgot to remember that MySQL was granted about 6GB as
well.. so I was 2-3GB into swap most of the time. ( Though I don't
know how much was Hadoop and how much was MySQL )
Anyway, I didn't bother so set up the 'real' Handoop environment with
daemon and hdfs but instead just run the streaming jar directly from
$hadoop_home. I don't know if this really matters in any way, I just
thought I mention it.

all the best,
Moritz

On Jul 16, 2010, at 11:59 AM, Michael Segel <michael_segel@hotmail.com> wrote:

>
>
> Moritz,
>
> I'm not sure what you're doing, but raising the number of mapers in your configuration
isn't a 'hint'.
>
> The number of mapers that you can run will depend on your configuration. You mention
an i7 which is a quad core cpu, but you don't mention the amount of memory you have available,
or what else you run on the machine. You don't want hadoop to swap.
>
> If your initial m/r jobs are taking input from a file, the default behavior is to create
one map/reduce task per block. So if your initial input file is < 64MB and you have kept
your default block size of 64MB, then you will only have one map/reduce task.
>
> I haven't played with Hadoop in a single node / pseudo distributed environment... just
in a distributed environment but I believe that the functionality is the same.
>
> HTH
>
> -Mike
> PS.  Please take my advice with a grain of salt. It's 5:00am and I haven't had my first
cup of coffee yet. ;-)
>
>> From: moritzkrog@googlemail.com
>> Date: Fri, 16 Jul 2010 11:03:19 +0200
>> Subject: Single Node with multiple mappers?
>> To: common-user@hadoop.apache.org
>>
>> Hi everyone,
>>
>> I was curious if there is any option to use Hadoop in single node mode
>> in a way, that enables the process to use more system ressources.
>> Right now, Hadoop uses one mapper and one reducer, leaving my i7 with
>> about 20% CPU usage (1 core for Hadoop, .5 cores for my OS) basically
>> idling.
>> Raising the number of map tasks doesn't seem to do much, as this
>> parameter seems to more of a hint anyway. Still, I have lots of CPU
>> time and RAM left. Any hints on how to use them?
>>
>> thanks in advance,
>> Moritz
>  		 	   		
> _________________________________________________________________
> The New Busy is not the old busy. Search, chat and e-mail from your inbox.
> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3

Mime
View raw message