hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@hortonworks.com>
Subject Re: Out of memory (heap space) errors on job tracker
Date Mon, 11 Jun 2012 00:39:15 GMT
Harsh - I'd be inclined to think it's worse than just setting mapreduce.jobtracker.completeuserjobs.maximum
- the only case this would solve is if a single user submitted 25 *large* jobs (in terms of
tasks) over a single 24-hr window.

David - I'm guessing you aren't using the CapacityScheduler - that would help you with more
controls, limits on jobs etc.

More details here: http://hadoop.apache.org/common/docs/r1.0.3/capacity_scheduler.html

In particular, look at the example config there and let us know if you need help understanding
any of it.

Arun

On Jun 9, 2012, at 10:40 PM, Harsh J wrote:

> Hey David,
> 
> Primarily you'd need to lower down
> "mapred.jobtracker.completeuserjobs.maximum" in your mapred-site.xml
> to a value of < 25. I recommend using 5, if you don't need much
> retention of job info per user. This will help keep the JT's live
> memory usage in check and stop your crashes instead of you having to
> raise your heap all the time. There's no "leak", but this config's
> default of 100 causes much issues to JT that runs a lot of jobs per
> day (from several users).
> 
> Try it out and let us know!
> 
> On Sat, Jun 9, 2012 at 12:37 AM, David Rosenstrauch <darose@darose.net> wrote:
>> We're running 0.20.2 (Cloudera cdh3u4).
>> 
>> What configs are you referring to?
>> 
>> Thanks,
>> 
>> DR
>> 
>> 
>> On 06/08/2012 02:59 PM, Arun C Murthy wrote:
>>> 
>>> This shouldn't be happening at all...
>>> 
>>> What version of hadoop are you running? Potentially you need configs to
>>> protect the JT that you are missing, those should ensure your hadoop-1.x JT
>>> is very reliable.
>>> 
>>> Arun
>>> 
>>> On Jun 8, 2012, at 8:26 AM, David Rosenstrauch wrote:
>>> 
>>>> Our job tracker has been seizing up with Out of Memory (heap space)
>>>> errors for the past 2 nights.  After the first night's crash, I doubled the
>>>> heap space (from the default of 1GB) to 2GB before restarting the job.
>>>>  After last night's crash I doubled it again to 4GB.
>>>> 
>>>> This all seems a bit puzzling to me.  I wouldn't have thought that the
>>>> job tracker should require so much memory.  (The NameNode, yes, but not the
>>>> job tracker.)
>>>> 
>>>> Just wondering if this behavior sounds reasonable, or if perhaps there
>>>> might be a bigger problem at play here.  Anyone have any thoughts on the
>>>> matter?
>>>> 
>>>> Thanks,
>>>> 
>>>> DR
>>> 
>>> 
>>> --
>>> Arun C. Murthy
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>> 
>>> 
>>> 
>> 
>> 
> 
> 
> 
> -- 
> Harsh J

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Mime
View raw message