hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boyu Zhang <boyuzhan...@gmail.com>
Subject Re: Yarn ResourceManager web UI does not show job
Date Tue, 22 Sep 2015 01:38:18 GMT
Thanks a lot for the answer!

If you don't mind help more on this, here is what I am seeing.

- The NameNode/DataNode and ResourceManager/NodeManager were running for 6
months before I discovered that the job history server was not running.
After bringing up the job history server, I saw like 2k+ jobs showing up
from the history server web ui. But then the job history server got
restarted, and I don't see any jobs more than 7 days old showing up in the
history web ui.

- I've disabled the cleaner in the config file.

My question is, is there a way to find/recover the job history files more
than 7 days old? I read that the container logs are stored locally in the
NodeManger user log dir, and there are files (I have not dig through them
yet). I am not sure if the deleted job history files (by history cleaner)
are not easy to recover.

Thanks in advance,
Boyu


On Mon, Sep 21, 2015 at 4:35 PM, Varun Saxena <vsaxena.varun@gmail.com>
wrote:

> MR jobs will write history files to path given by config
> mapreduce.jobhistory.intermediate-done-dir
> History server will then move them to done dir which is given by config m
> apreduce.jobhistory.done-dir.
>
> By default these config values
> are ${yarn.app.mapreduce.am.staging-dir}/history/done_intermediate
> and ${yarn.app.mapreduce.am.staging-dir}/history/done respectively.
>
> 7 days is also configurable(config being mapreduce.jobhistory.max-age-ms).
> You can set this value according to your cluster.
>
> I hope this answers your question.
>
> Regards,
> Varun Saxena.
>
> On Tue, Sep 22, 2015 at 1:39 AM, Boyu Zhang <boyuzhang35@gmail.com> wrote:
>
>> Thanks a lot for the clarification!
>>
>> I tried to find the log and history information about finished jobs. But
>> they are not in hfs://xxx/user/myusername/output/_SUCCESS (0B). Can you
>> please give some pointers on where the statistical/job history files are
>> located? The hfs://xxxx/history/done only stores history files up to 7 days.
>>
>> Thanks,
>> Boyu
>>
>> On Mon, Sep 21, 2015 at 1:23 PM, Varun Saxena <vsaxena.varun@gmail.com>
>> wrote:
>>
>>> No, you cant show them in RM UI then.
>>>
>>> However if you can start another daemon, you can consider using YARN
>>> Application History/Timeline Server or MR Job History Server(only for MR
>>> jobs)  to see information about completed jobs.
>>> You can look up Hadoop documentation to learn more about them and how to
>>> configure them.
>>>
>>> Just to clarify though, the apps themselves are not lost, as in, the
>>> output is not lost. Its just the information about them which is no longer
>>> present on RM restart.
>>>
>>> Regards,
>>> Varun Saxena.
>>>
>>> On Mon, Sep 21, 2015 at 10:31 PM, Boyu Zhang <boyuzhang35@gmail.com>
>>> wrote:
>>>
>>>> Thanks for the answer Varun.
>>>>
>>>> It is the case that yarn.resourcemanager.recovery.enabled is set to be
>>>> false. Is there a way to show the jobs that are submitted before the
>>>> restart? We don't want to lose that data.
>>>>
>>>> Thanks,
>>>> Boyu
>>>>
>>>>
>>>> On Mon, Sep 21, 2015 at 12:53 PM, Varun Saxena <vsaxena.varun@gmail.com
>>>> > wrote:
>>>>
>>>>> Hi Boyu,
>>>>>
>>>>> RM stores apps in state store if recovery is enabled. Only then they
>>>>> will be available on restart.
>>>>> Otherwise they are kept in memory and hence lost on restart.
>>>>>
>>>>> You may not have it enabled. Check config value for below config. By
>>>>> default its false.
>>>>> yarn.resourcemanager.recovery.enabled
>>>>>
>>>>> Regards,
>>>>> Varun.
>>>>>
>>>>> On Mon, Sep 21, 2015 at 10:01 PM, Boyu Zhang <boyuzhang35@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello Everyone,
>>>>>>
>>>>>> I have a strange error regarding the ResourceManager web UI (
>>>>>> http://xx.xx:8088).
>>>>>>
>>>>>> Someone before me set up the hadoop + yarn cluster using Pivotal
HD,
>>>>>> it was running fine. Then today, the resource manager and node manager
>>>>>> disappeared, the logs did not record this. I restarted them, they
are up
>>>>>> and running, but the resource manger web UI does not show any jobs.
We have
>>>>>> 700+ jobs in the past, and they were showing before.
>>>>>>
>>>>>> If I submit MapReduce jobs, the new submitted ones show up. But the
>>>>>> disappear again after restart the resource manger and node manager.
>>>>>>
>>>>>> Can anyone give any hint on where to look?
>>>>>>
>>>>>> Thanks in advance,
>>>>>> Boyu
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message