hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Viswanathan J <jayamviswanat...@gmail.com>
Subject Re: Hadoop Jobtracker heap size calculation and OOME
Date Tue, 15 Oct 2013 02:07:12 GMT
Hi,

Not yet updated in production environment. Will keep you posted once it is
done.

In which Apache hadoop release this issue will be fixed? Or this issue
already fixed in hadoop-1.2.1 version as in the given below link,

https://issues.apache.org/jira/i#browse/MAPREDUCE-5351?issueKey=MAPREDUCE-5351&serverRenderedViewIssue=true

Please confirm.

Thanks,
On Oct 15, 2013 3:43 AM, "Antwnis" <antwnis@gmail.com> wrote:

> Viswana,
>
> please confirm :) whether the issue was fixed - for future readers of this
> thread
>
> with this configuration, after restarting the JobTracker you should see on
> the jobtracker page that the memory usage remains low over time
>
> Antonios
>
>
> On Mon, Oct 14, 2013 at 10:56 AM, Antwnis <antwnis@gmail.com> wrote:
>
>> After changing mapred-site.xml , you will have to restart the JobTracker
>> to have the changes applied to it
>>
>>
>> On Mon, Oct 14, 2013 at 10:37 AM, Viswanathan J <
>> jayamviswanathan@gmail.com> wrote:
>>
>>> Thanks a lot and lot Antonio.
>>>
>>> I'm using the Apache hadoop, hope this issue will be resolved in
>>> upcoming apache hadoop releases.
>>>
>>> Do I need the restart whole cluster after changing the mapred site conf
>>> as you mentioned?
>>>
>>> What is the following bug id,
>>>
>>>
>>> https://issues.apache.org/jira/i#browse/MAPREDUCE-5351?issueKey=MAPREDUCE-5351&amp;serverRenderedViewIssue=true<https://issues.apache.org/jira/i#browse/MAPREDUCE-5351?issueKey=MAPREDUCE-5351&serverRenderedViewIssue=true>
>>>
>>> Is this issue was different from OOME, but they mentioned that issue is
>>> fixed.
>>>
>>> Thanks,
>>> Viswa.J
>>>  On Oct 14, 2013 2:44 PM, "Antonios Chalkiopoulos" <antwnis@gmail.com>
>>> wrote:
>>>
>>>> In *mapred-site.xml* you need the following snipset:
>>>>
>>>> <property>
>>>> <name>mapreduce.jobtracker.retiredjobs.cache.size</name>
>>>> <value>100</value>
>>>> </property>
>>>> <property>
>>>> <name>keep.failed.task.files</name>
>>>> <value>true</value>
>>>> </property>
>>>> <property>
>>>> <name>keep.task.files.pattern</name>
>>>> <value>shouldnevereverevermatch</value>
>>>> </property>
>>>>
>>>>
>>>> This will fix the memory leak issue ( the official fix i think is
>>>> available in Cloudera's 4.6 distribution )
>>>> It will cause another issue - that is not removing the .staging files
>>>> from the /user/*/.staging/ location
>>>>
>>>>
>>>> To overcome this use a daily Jenkins job ( or cron ) and
>>>>
>>>> #!/bin/bash
>>>> LAST_DATE=$(date -ud '-7days' +%s)
>>>> hdfs dfs -ls /user/*/.staging | awk '/^d/ {m_date=$6;gsub("-","
>>>> ",m_date); ep_date=strftime("%s", mktime(m_date" 00 00 00")); if ( ep_date
>>>> <= l_date ) print $8 }' l_date=$LAST_DATE | xargs -P 2 --verbose hdfs
>>>> dfs -rm -r -skipTrash
>>>>
>>>>
>>>> ^ The above will remove all directories that were created more than 7
>>>> days ago .. and will keep your HDFS clean
>>>>
>>>>
>>>>
>>>> On Monday, 14 October 2013 09:52:41 UTC+1, Viswanathan J wrote:
>>>>>
>>>>> Hi guys,
>>>>>
>>>>> Appreciate your response.
>>>>>
>>>>> Thanks,
>>>>> Viswa.J
>>>>> On Oct 12, 2013 11:29 PM, "Viswanathan J" <jayamvis...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Guys,
>>>>>>
>>>>>> But I can see the jobtracker OOME issue fixed in hadoop - 1.2.1
>>>>>> version as per the hadoop release notes as below.
>>>>>>
>>>>>> Please check this URL,
>>>>>>
>>>>>> https://issues.apache.org/**jira/browse/MAPREDUCE-5351<https://issues.apache.org/jira/browse/MAPREDUCE-5351>
>>>>>>
>>>>>> How come the issue still persist? I'm I asking a valid thing.
>>>>>>
>>>>>> Do I need to configure anything our I missing anything.
>>>>>>
>>>>>> Please help. Appreciate your response.
>>>>>>
>>>>>> Thanks,
>>>>>> Viswa.J
>>>>>> On Oct 12, 2013 7:57 PM, "Viswanathan J" <jayamvis...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Antonio, hope the memory leak issue will be resolved.
Its
>>>>>>> really nightmare every week.
>>>>>>>
>>>>>>> In which release this issue will be resolved?
>>>>>>>
>>>>>>> How to solve this issue, please help because we are facing in
>>>>>>> production environment.
>>>>>>>
>>>>>>> Please share the configuration and cron to do that cleanup process.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Viswa
>>>>>>> On Oct 12, 2013 7:31 PM, "Antonios Chalkiopoulos" <ant...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> "After restart the JT, within a week getting OOME."
>>>>>>>>
>>>>>>>> Viswa, we were having the same issue in our cluster as well
-
>>>>>>>> roughly every 5-7 days getting OOME.
>>>>>>>> The heap size of the Job Tracker was constantly increasing
due to a
>>>>>>>> memory leak that will hopefully be fixed in newest releases.
>>>>>>>>
>>>>>>>> There is a configuration change in the JobTracker that will
disable
>>>>>>>> a functionality regarding cleaning up staging files i.e.
>>>>>>>> /user/build/.staging/* - but that means that you will have
to
>>>>>>>> handle the staging files through a cron / jenkins task
>>>>>>>>
>>>>>>>> I'll get you the configuration on Monday..
>>>>>>>>
>>>>>>>> On Friday, 11 October 2013 18:08:55 UTC+1, Viswanathan J
wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I'm running a 14 nodes of Hadoop cluster with
>>>>>>>>> datanodes,tasktrackers running in all nodes.
>>>>>>>>>
>>>>>>>>> *Apache Hadoop :* 1.2.1
>>>>>>>>>
>>>>>>>>> It shows the heap size currently as follows:
>>>>>>>>>
>>>>>>>>> *Cluster Summary (Heap Size is 5.7/8.89 GB)*
>>>>>>>>> *
>>>>>>>>> *
>>>>>>>>> In the above summary what is the *8.89* GB defines? Is
the *8.89*defines maximum heap size for Jobtracker, if yes how it has
>>>>>>>>> been calculated.
>>>>>>>>>
>>>>>>>>> Hope *5.7* is currently running jobs heap-size, how it
>>>>>>>>> is calculated.
>>>>>>>>>
>>>>>>>>> Have set the jobtracker default memory size in hadoop-env.sh
>>>>>>>>>
>>>>>>>>> *HADOOP_HEAPSIZE="1024"*
>>>>>>>>> *
>>>>>>>>> *
>>>>>>>>> Have set the mapred.child.java.opts value in mapred-site.xml
as,
>>>>>>>>>
>>>>>>>>>  <property>
>>>>>>>>>   <name>mapred.child.java.opts</****name>
>>>>>>>>>   <value>-Xmx2048m</value>
>>>>>>>>>  </property>
>>>>>>>>>
>>>>>>>>> Even after setting the above property, getting Jobtracker
OOME
>>>>>>>>> issue. How the jobtracker memory gradually increasing.
After restart the
>>>>>>>>> JT, within a week getting OOME.
>>>>>>>>>
>>>>>>>>> How to resolve this, it is in production and critical?
Please
>>>>>>>>> help. Thanks in advance.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Regards,
>>>>>>>>> Viswa.J
>>>>>>>>>
>>>>>>>>  --
>>>>>>>>
>>>>>>>> ---
>>>>>>>> You received this message because you are subscribed to the
Google
>>>>>>>> Groups "CDH Users" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails
from it,
>>>>>>>> send an email to cdh-user+u...@cloudera.**org.
>>>>>>>> For more options, visit https://groups.google.com/a/**
>>>>>>>> cloudera.org/groups/opt_out<https://groups.google.com/a/cloudera.org/groups/opt_out>
>>>>>>>> .
>>>>>>>>
>>>>>>>   --
>>>>
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "CDH Users" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to cdh-user+unsubscribe@cloudera.org.
>>>>
>>>> For more options, visit
>>>> https://groups.google.com/a/cloudera.org/groups/opt_out.
>>>>
>>>
>>
>>
>> --
>>
>
>
>
> --
>

Mime
View raw message