hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Drake민영근 <drake....@nexr.com>
Subject Re: tracking remote reads in datanode logs
Date Tue, 24 Feb 2015 22:46:41 GMT
Hi, Igor

The AM logs are in the Hdfs if you set log aggregation property. Otherwise,
they are in the container log directory. See this:
http://ko.hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/

Thanks

2015년 2월 25일 수요일, Igor Bogomolov<igor.bogomolov@gmail.com>님이 작성한
메시지:

> Hi Drake,
>
> Thanks for a pointer. AM log indeed have information about remote map
> tasks. But I'd like to have more low level details. Like on which node each
> map task was scheduled and how many bytes was read. That should be exactly
> in datanode log and I saw it for another job. But after I reinstall the
> cluster it's not there anymore :(
>
> Could you please tell the path where AM log is located (from which you
> copied the lines)? I found it in web interface but not as file on a disk.
> And nothing in /var/log/hadoop-*
>
> Thanks,
> Igor
>
> On Tue, Feb 24, 2015 at 1:51 AM, Drake민영근 <drake.min@nexr.com
> <javascript:_e(%7B%7D,'cvml','drake.min@nexr.com');>> wrote:
>
>> I found this in the mapreduce am log.
>>
>> 2015-02-23 11:22:45,576 INFO [RMCommunicator Allocator]
>> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before
>> Scheduling: PendingReds:1 ScheduledMaps:5 ScheduledReds:0 AssignedMaps:0
>> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0
>> HostLocal:0 RackLocal:0
>> ..
>> 2015-02-23 11:22:46,641 INFO [RMCommunicator Allocator]
>> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After
>> Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5
>> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:5 ContRel:0
>> HostLocal:3 RackLocal:2
>> ..
>>
>> The first line says Map tasks are 5 and second says HostLocal 3 and Rack
>> Local 2. I think the Rack Local 2 are the remote map tasks as you mentioned
>> before.
>>
>>
>> Drake 민영근 Ph.D
>> kt NexR
>>
>> On Tue, Feb 24, 2015 at 9:45 AM, Drake민영근 <drake.min@nexr.com
>> <javascript:_e(%7B%7D,'cvml','drake.min@nexr.com');>> wrote:
>>
>>> Hi, Igor
>>>
>>> Did you look at the mapreduce application master log? I think the local
>>> or rack local map tasks are logged in the MapReduce AM log.
>>>
>>> Good luck.
>>>
>>> Drake 민영근 Ph.D
>>> kt NexR
>>>
>>> On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomolov <
>>> igor.bogomolov@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','igor.bogomolov@gmail.com');>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I want
>>>> to know how many remote map tasks (ones that read input data from remote
>>>> nodes) there are in a mapreduce job. For this purpose I took logs of each
>>>> datanode an looked for lines with "op: HDFS_READ" and cliID field that
>>>> contains map task id.
>>>>
>>>> Surprisingly, 4 datanode logs does not contain lines with "op: HDFS_READ".
>>>> Another 1 has many lines with "op: HDFS_READ" but all cliID look like
>>>> DFSClient_NONMAPREDUCE_* and does not contain any map task id.
>>>>
>>>> I concluded there are no remote map tasks but that does not look
>>>> correct. Also even local reads are not logged (because there is no line
>>>> where cliID field contains some map task id). Could anyone please
>>>> explain what's wrong? Why logging is not working? (I use default settings).
>>>>
>>>> Chris,
>>>>
>>>> Found HADOOP-3062 <https://issues.apache.org/jira/browse/HADOOP-3062>
>>>> that you have implemented. Thought you might have an explanation.
>>>>
>>>> Best,
>>>> Igor
>>>>
>>>>
>>>
>>
>

-- 
Drake 민영근 Ph.D
kt NexR

Mime
View raw message