hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Dyer <redp...@umd.edu>
Subject Best practices with large-memory jobs
Date Tue, 15 Sep 2009 05:42:45 GMT
Hello Hadoopers-
I'm attempting to run some large-memory map tasks with using hadoop
streaming, but I seem to be running afoul of the mapred.child.ulimit
restriction, which is set to 2097152.  I assume this is in KB since my
tasks fail when they get to about 2GB (I just need to get to about
2.3GB- almost there!).  So far, nothing I've tried has succeeded in
changing this value.   I've attempted to add
-jobconf mapred.child.ulimt=3000000
to the streaming command line, but to no avail.  In the job's xml file
that I find in my logs, it's still got the old value.  And worse, in
my task logs I see the message:
"attempt to override final parameter: mapred.child.ulimit;  Ignoring."
which doesn't exactly inspire confidence that I'm on the right path.

I see there's been a fair amount of traffic on Jira about large memory
jobs, but there doesn't seem to be much in the way of examples or
documentation.  Can someone tell me how to run such a job, especially
a streaming job?

Many thanks in advance--
ps. I'm running an 18.3 cluster on Amazon EC2 (I've been using the
Cloudera convenience scripts, but I can abandon this if I need more
control).  The instances have plenty of memory (7.5GB).

View raw message