eagle-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhang, Edward (GDI Hadoop)" <yonzh...@ebay.com>
Subject Re: [Discuss] Hadoop metrics,job,GC monitoring
Date Tue, 15 Dec 2015 07:48:24 GMT
started some documentation on
https://cwiki.apache.org/confluence/display/EAG/Hadoop+Native+Metrics+Monit
oring

Thanks Hao, Ralph etc. for offline review and suggestions, I would improve
that.

In terms of the question ┬│if user adds a new metric to monitor, how
processing layer would change accordingly┬▓

I think if user adds a new metric, this metric should be added into
metadata table, and data source layer and processing layer should see
consistent list of metrics.

But we still need bake this design, please comment whatever is your
thoughts.

Thanks
Edward


On 12/14/15, 11:04, "Arun Manoharan" <arunmanoharan@apache.org> wrote:

>Thanks Edward for starting the thread. I think it is important to have the
>job monitoring (MR/Spark) workloads for performance of the cluster and
>availability.
>
>But it will be beneficial to have an extensible framework where users can
>create business rules like "I want an alert when NN is in safemode or RM
>is
>flipping etc".
>
>Thanks,
>Arun
>
>On Mon, Dec 14, 2015 at 10:58 AM, Zhang, Edward (GDI Hadoop) <
>yonzhang@ebay.com> wrote:
>
>> Hi Eagle devs/users,
>>
>> As proposed in apache eagle incubator proposal, Eagle will start
>> design/dev to support Hadoop system monitoring besides security
>>monitoring
>> which includes Hadoop native metrics, job, gclog etc.
>>
>> The community is also interested in Hadoop system monitoring by Eagle
>>when
>> we recently talked about Eagle product in public conferences, meet up
>>etc.
>>
>> Take Hadoop native metrics as an example, first of all those metrics are
>> pretty valuable in determining system health status, secondly collecting
>> huge amount metrics, visualizing, and alerting is very challenging.  We
>> need think of declarative collection, dynamic aggregation, metric
>>storage,
>> metric query engine etc.
>>
>> Besides technical design, comprehensive policy/rule are also valuable to
>> be shared in the community. Those policy/rule represent best practice in
>> the world to manage large Hadoop clusters.
>>
>> Please suggest whatever is for engineering design or business
>>policy/rules.
>>
>> Thanks
>> Edward
>>
>>


Mime
View raw message