hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Bogomolov <igor.bogomo...@gmail.com>
Subject Re: tracking remote reads in datanode logs
Date Wed, 25 Feb 2015 14:21:47 GMT
Thanks a lot!

Igor

On Tue, Feb 24, 2015 at 11:46 PM, Drake민영근 <drake.min@nexr.com> wrote:

> Hi, Igor
>
> The AM logs are in the Hdfs if you set log aggregation property.
> Otherwise, they are in the container log directory. See this:
> http://ko.hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/
>
> Thanks
>
> 2015년 2월 25일 수요일, Igor Bogomolov<igor.bogomolov@gmail.com>님이 작성한
메시지:
>
> Hi Drake,
>>
>> Thanks for a pointer. AM log indeed have information about remote map
>> tasks. But I'd like to have more low level details. Like on which node each
>> map task was scheduled and how many bytes was read. That should be exactly
>> in datanode log and I saw it for another job. But after I reinstall the
>> cluster it's not there anymore :(
>>
>> Could you please tell the path where AM log is located (from which you
>> copied the lines)? I found it in web interface but not as file on a disk.
>> And nothing in /var/log/hadoop-*
>>
>> Thanks,
>> Igor
>>
>> On Tue, Feb 24, 2015 at 1:51 AM, Drake민영근 <drake.min@nexr.com> wrote:
>>
>>> I found this in the mapreduce am log.
>>>
>>> 2015-02-23 11:22:45,576 INFO [RMCommunicator Allocator]
>>> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before
>>> Scheduling: PendingReds:1 ScheduledMaps:5 ScheduledReds:0 AssignedMaps:0
>>> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0
>>> HostLocal:0 RackLocal:0
>>> ..
>>> 2015-02-23 11:22:46,641 INFO [RMCommunicator Allocator]
>>> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After
>>> Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5
>>> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:5 ContRel:0
>>> HostLocal:3 RackLocal:2
>>> ..
>>>
>>> The first line says Map tasks are 5 and second says HostLocal 3 and Rack
>>> Local 2. I think the Rack Local 2 are the remote map tasks as you mentioned
>>> before.
>>>
>>>
>>> Drake 민영근 Ph.D
>>> kt NexR
>>>
>>> On Tue, Feb 24, 2015 at 9:45 AM, Drake민영근 <drake.min@nexr.com> wrote:
>>>
>>>> Hi, Igor
>>>>
>>>> Did you look at the mapreduce application master log? I think the local
>>>> or rack local map tasks are logged in the MapReduce AM log.
>>>>
>>>> Good luck.
>>>>
>>>> Drake 민영근 Ph.D
>>>> kt NexR
>>>>
>>>> On Tue, Feb 24, 2015 at 3:30 AM, Igor Bogomolov <
>>>> igor.bogomolov@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> In a small cluster of 5 nodes that run CDH 5.3.0 (Hadoop 2.5.0) I
>>>>> want to know how many remote map tasks (ones that read input data from
>>>>> remote nodes) there are in a mapreduce job. For this purpose I took logs
of
>>>>> each datanode an looked for lines with "op: HDFS_READ" and cliID
>>>>> field that contains map task id.
>>>>>
>>>>> Surprisingly, 4 datanode logs does not contain lines with "op: HDFS_READ".
>>>>> Another 1 has many lines with "op: HDFS_READ" but all cliID look like
>>>>> DFSClient_NONMAPREDUCE_* and does not contain any map task id.
>>>>>
>>>>> I concluded there are no remote map tasks but that does not look
>>>>> correct. Also even local reads are not logged (because there is no line
>>>>> where cliID field contains some map task id). Could anyone please
>>>>> explain what's wrong? Why logging is not working? (I use default settings).
>>>>>
>>>>> Chris,
>>>>>
>>>>> Found HADOOP-3062 <https://issues.apache.org/jira/browse/HADOOP-3062>
>>>>> that you have implemented. Thought you might have an explanation.
>>>>>
>>>>> Best,
>>>>> Igor
>>>>>
>>>>>
>>>>
>>>
>>
>
> --
> Drake 민영근 Ph.D
> kt NexR
>
>

Mime
View raw message