hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Smith <csmi...@gmail.com>
Subject Re: collecting CPU, mem, iops of hadoop jobs
Date Tue, 03 Jan 2012 12:09:22 GMT
Have a look at OpenTSDB (http://opentsdb.net/overview.html) as this
does not have the same down sampling issue as Ganglia and stores the
metrics in HBase making it easier to access and process the data.
It's also pretty easy to add your own metrics.

Another useful utility is 'collectl'
(http://collectl.sourceforge.net/) which I tend to leave running in
the background on each node collecting, storing and managing machine
metrics locally - it's very lightweight.  When I have an issue that
requires a metric I forgot to capture with Ganglia I usually find it
in the 'collectl' logs - as long as I get to the logs before they roll
- usually a week.  This also doesn't have the down sampling issue but
it doesn't automatically agregate the data to a central database.

Regards,

Chris

On 21 December 2011 01:20, Arun C Murthy <acm@hortonworks.com> wrote:
> Go ahead and open a MR jira (would appreciate a patch too! ;) ).
>
> thanks,
> Arun
>
> On Dec 20, 2011, at 2:55 PM, Patai Sangbutsarakum wrote:
>
>> Thanks again Arun, you save me again.. :-)
>>
>> This is a great starting point. for CPU and possibly Mem.
>>
>> For the IOPS, just would like to ask if the tasknode/datanode collect the number
>> or we should dig into OS level.. like /proc/PID_OF_tt/io
>> ^hope this make sense
>>
>> -P
>>
>> On Tue, Dec 20, 2011 at 1:22 PM, Arun C Murthy <acm@hortonworks.com> wrote:
>>> Take a look at the JobHistory files produced for each job.
>>>
>>> With 0.20.205 you get CPU (slot millis).
>>> With 0.23 (alpha quality) you get CPU and JVM metrics (GC etc.). I believe you
also get Memory, but not IOPS.
>>>
>>> Arun
>>>
>>> On Dec 20, 2011, at 1:11 PM, Patai Sangbutsarakum wrote:
>>>
>>>> Thanks for reply, but I don't think metric exposed to Ganglia would be
>>>> what i am really looking for..
>>>>
>>>> what i am looking for is some kind of these (but not limit to)
>>>>
>>>> Job_xxxx_yyyy
>>>> CPU time: 10204 sec.   <--aggregate from all tasknodes
>>>> IOPS: 2344  <-- aggregated from all datanode
>>>> MEM: 30G   <-- aggregated
>>>>
>>>> etc,
>>>>
>>>> Job_aaa_bbb
>>>> CPU time:
>>>> IOPS:
>>>> MEM:
>>>>
>>>> Sorry for ambiguous question.
>>>> Thanks
>>>>
>>>> On Tue, Dec 20, 2011 at 12:47 PM, He Chen <airbots@gmail.com> wrote:
>>>>> You may need Ganglia. It is a cluster monitoring software.
>>>>>
>>>>> On Tue, Dec 20, 2011 at 2:44 PM, Patai Sangbutsarakum <
>>>>> silvianhadoop@gmail.com> wrote:
>>>>>
>>>>>> Hi Hadoopers,
>>>>>>
>>>>>> We're running Hadoop 0.20 CentOS5.5. I am finding the way to collect
>>>>>> CPU time, memory usage, IOPS of each hadoop Job.
>>>>>> What would be the good starting point ? document ? api ?
>>>>>>
>>>>>> Thanks in advance
>>>>>> -P
>>>>>>
>>>
>

Mime
View raw message