hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Viswanathan J <jayamviswanat...@gmail.com>
Subject Re: Hadoop Jobtracker heap size calculation and OOME
Date Mon, 14 Oct 2013 09:37:01 GMT
Thanks a lot and lot Antonio.

I'm using the Apache hadoop, hope this issue will be resolved in upcoming
apache hadoop releases.

Do I need the restart whole cluster after changing the mapred site conf as
you mentioned?

What is the following bug id,

https://issues.apache.org/jira/i#browse/MAPREDUCE-5351?issueKey=MAPREDUCE-5351&amp;serverRenderedViewIssue=true

Is this issue was different from OOME, but they mentioned that issue is
fixed.

Thanks,
Viswa.J
 On Oct 14, 2013 2:44 PM, "Antonios Chalkiopoulos" <antwnis@gmail.com>
wrote:

> In *mapred-site.xml* you need the following snipset:
>
> <property>
> <name>mapreduce.jobtracker.retiredjobs.cache.size</name>
> <value>100</value>
> </property>
> <property>
> <name>keep.failed.task.files</name>
> <value>true</value>
> </property>
> <property>
> <name>keep.task.files.pattern</name>
> <value>shouldnevereverevermatch</value>
> </property>
>
>
> This will fix the memory leak issue ( the official fix i think is
> available in Cloudera's 4.6 distribution )
> It will cause another issue - that is not removing the .staging files from
> the /user/*/.staging/ location
>
>
> To overcome this use a daily Jenkins job ( or cron ) and
>
> #!/bin/bash
> LAST_DATE=$(date -ud '-7days' +%s)
> hdfs dfs -ls /user/*/.staging | awk '/^d/ {m_date=$6;gsub("-","
> ",m_date); ep_date=strftime("%s", mktime(m_date" 00 00 00")); if ( ep_date
> <= l_date ) print $8 }' l_date=$LAST_DATE | xargs -P 2 --verbose hdfs dfs
> -rm -r -skipTrash
>
>
> ^ The above will remove all directories that were created more than 7 days
> ago .. and will keep your HDFS clean
>
>
>
> On Monday, 14 October 2013 09:52:41 UTC+1, Viswanathan J wrote:
>>
>> Hi guys,
>>
>> Appreciate your response.
>>
>> Thanks,
>> Viswa.J
>> On Oct 12, 2013 11:29 PM, "Viswanathan J" <jayamvis...@gmail.com> wrote:
>>
>>> Hi Guys,
>>>
>>> But I can see the jobtracker OOME issue fixed in hadoop - 1.2.1 version
>>> as per the hadoop release notes as below.
>>>
>>> Please check this URL,
>>>
>>> https://issues.apache.org/**jira/browse/MAPREDUCE-5351<https://issues.apache.org/jira/browse/MAPREDUCE-5351>
>>>
>>> How come the issue still persist? I'm I asking a valid thing.
>>>
>>> Do I need to configure anything our I missing anything.
>>>
>>> Please help. Appreciate your response.
>>>
>>> Thanks,
>>> Viswa.J
>>> On Oct 12, 2013 7:57 PM, "Viswanathan J" <jayamvis...@gmail.com> wrote:
>>>
>>>> Thanks Antonio, hope the memory leak issue will be resolved. Its really
>>>> nightmare every week.
>>>>
>>>> In which release this issue will be resolved?
>>>>
>>>> How to solve this issue, please help because we are facing in
>>>> production environment.
>>>>
>>>> Please share the configuration and cron to do that cleanup process.
>>>>
>>>> Thanks,
>>>> Viswa
>>>> On Oct 12, 2013 7:31 PM, "Antonios Chalkiopoulos" <ant...@gmail.com>
>>>> wrote:
>>>>
>>>>> "After restart the JT, within a week getting OOME."
>>>>>
>>>>> Viswa, we were having the same issue in our cluster as well - roughly
>>>>> every 5-7 days getting OOME.
>>>>> The heap size of the Job Tracker was constantly increasing due to a
>>>>> memory leak that will hopefully be fixed in newest releases.
>>>>>
>>>>> There is a configuration change in the JobTracker that will disable a
>>>>> functionality regarding cleaning up staging files i.e.
>>>>> /user/build/.staging/* - but that means that you will have to handle
>>>>> the staging files through a cron / jenkins task
>>>>>
>>>>> I'll get you the configuration on Monday..
>>>>>
>>>>> On Friday, 11 October 2013 18:08:55 UTC+1, Viswanathan J wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm running a 14 nodes of Hadoop cluster with datanodes,tasktrackers
>>>>>> running in all nodes.
>>>>>>
>>>>>> *Apache Hadoop :* 1.2.1
>>>>>>
>>>>>> It shows the heap size currently as follows:
>>>>>>
>>>>>> *Cluster Summary (Heap Size is 5.7/8.89 GB)*
>>>>>> *
>>>>>> *
>>>>>> In the above summary what is the *8.89* GB defines? Is the *8.89*defines
maximum heap size for Jobtracker, if yes how it has
>>>>>> been calculated.
>>>>>>
>>>>>> Hope *5.7* is currently running jobs heap-size, how it is calculated.
>>>>>>
>>>>>> Have set the jobtracker default memory size in hadoop-env.sh
>>>>>>
>>>>>> *HADOOP_HEAPSIZE="1024"*
>>>>>> *
>>>>>> *
>>>>>> Have set the mapred.child.java.opts value in mapred-site.xml as,
>>>>>>
>>>>>>  <property>
>>>>>>   <name>mapred.child.java.opts</****name>
>>>>>>   <value>-Xmx2048m</value>
>>>>>>  </property>
>>>>>>
>>>>>> Even after setting the above property, getting Jobtracker OOME issue.
>>>>>> How the jobtracker memory gradually increasing. After restart the
JT,
>>>>>> within a week getting OOME.
>>>>>>
>>>>>> How to resolve this, it is in production and critical? Please help.
>>>>>> Thanks in advance.
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Viswa.J
>>>>>>
>>>>>  --
>>>>>
>>>>> ---
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "CDH Users" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to cdh-user+u...@cloudera.**org.
>>>>> For more options, visit https://groups.google.com/a/**
>>>>> cloudera.org/groups/opt_out<https://groups.google.com/a/cloudera.org/groups/opt_out>
>>>>> .
>>>>>
>>>>   --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "CDH Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cdh-user+unsubscribe@cloudera.org.
> For more options, visit
> https://groups.google.com/a/cloudera.org/groups/opt_out.
>

Mime
View raw message