hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Anomaly Detection Framework with Chukwa" by JiaqiTan
Date Thu, 11 Jun 2009 07:37:17 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by JiaqiTan:

- Describe Anomaly Detection Framework with Chukwa here.
  == Introduction ==
- Hadoop is a great computation platform for map reduce job, but trouble shooting faulty compute
node in the cluster is not an easy task.    Chukwa Anomaly Detection System, is a system for
detecting computer failure and misuse by monitoring system activity and classifying it as
either normal or anomalous. The classification is based on heuristics, rules, and patterns,
and will detect any type of misuse that falls out of normal system operation.
+ We describe a general framework for implementing algorithms for detecting anomalies in systems
(Hadoop or otherwise) being monitored by Chukwa, by using the data collected by the Chukwa
framework, as well as for visualizing the outcomes of these algorithms. We envision that anomaly
detection algorithms for the Chukwa-monitored clusters can be most naturally implemented as
described here. 
- In order to determine what is failure, the system must be taught to recognize normal system
activity. This can be accomplished in several ways, most often with artificial intelligence
type techniques. Systems using neural networks have been used to great effect. Another method
is to define what normal usage of the system comprises using a strict mathematical model,
and flag any deviation from this as an system problem. This is known as strict anomaly detection.
 For the prototyping phase, Chukwa will use strict mathematical model as the skeleton.
+ The types of operations that this framework would enable fall in these broad categories:

+  1. Performing anomaly detection on collected system data (metrics, logs) to identify system
elements (nodes, jobs, tasks) that are anomalous,
+  1. Applying higher-level processing on collected system data to generate abstract views
of the monitored system that synthesize multiple viewpoints, and
+  1. Applying higher-level processing on anomaly detection output to generate secondary anomaly
+  1. Presenting and/or visualizing the outcomes of the above steps.
  == Design ==
- A new processing pipeline has been introduced to post demux processor.  This enables Chukwa
to run ping/mr job based aggregation and anomaly detection framework.
+ The tasks described above will be performed in a PostProcess stage which occurs after the
Demux. These tasks will take as their inputs the output of the Demux stage, and generate as
their outputs (i) anomalous system elements, (ii) abstract system views, or (iii) visualizable
data (e.g. raw datapoints to be fed into visualization widgets). These tasks will be MapReduce
or Pig jobs, and Chukwa would manage these tasks by accepting a list of MapReduce and/or Pig
jobs, and these jobs would form the anomaly detection workflow. 
+ In keeping with the consistency of the Chukwa architecture, these jobs in the anomaly detection
workflow would have to accept SequenceFiles of ChukwaRecords as their inputs, and would generate
SequenceFiles of ChukwaRecords as their outputs. 
+ Finally, the outputs of these tasks would be fed into HICC for visualization. The current
approach would be to use the MDL (Metrics Data Loader) to load the data to an RDBMS of choice
which can be read by HICC widgets.
+ Hence, the overall workflow of the anomaly detection would be as follows:
+  1. MapReduce/Pig job processes post-Demux output to generate abstract view and/or anomaly
detection output
+  1. (Optional) Additional MapReduce/Pig job processes abstract views/anomaly detection output
to generate secondary anomaly detection output
+  1. Data fed into HICC via an RDBMS
+  1. HICC widget loads anomaly detection/abstract view data from RDBMS for visualization
  == Implementation ==
+ === Hadoop anomaly detection and behavioral visualization  ===
+ Current active developments for the Chukwa Anomaly Detection Framework are for detecting
anomalies in Hadoop based on the following tools/concepts:
+  1. [http://www.usenix.org/event/wasl08/tech/full_papers/tan/tan_html/ SALSA] for State-machine
extraction of Hadoop's behavior from its logs
+  1. [http://www.usenix.org/event/wasl08/tech/full_papers/tan/tan_html/ SALSA] for Hadoop
task-based anomaly detection using Hadoop's logs
+  1. [http://www.pdl.cmu.edu/PDL-FTP/stray/mochi_tan_hotcloud09_abs.html Mochi] for visualization
of Hadoop's behavior (Swimlanes plots, MIROS heatmaps of aggregate data-flow) and extraction
of causal job-centric data-flows (JCDF)
+  1. [http://www.pdl.cmu.edu/PDL-FTP/stray/CMU-PDL-08-112_abs.html Ganesha] for node-based
anomaly detection using OS-collected black-box metrics
+ The workflow is as follows (class names, if available, + status listed in square brackets):
+  1. ({{{FSMBuilder}}}, available soon) SALSA is used to extract state-machine views of Hadoop's
execution - uses post-Demux output; uses {{{JobData/JobHistory}}},{{{ClientTrace}}},and TaskTracker-generated
+  1. Anomaly detection MapReduce program reads in state-machine data generated from {{{FSMBuilder}}}
to generate anomaly alerts. 
+  1. (CHUKWA-279) State-machine data from {{{FSMBuilder}}} is loaded into RDBMS using MDL

+  1. (CHUKWA-279) Raw state-machine views visualized using Swimlanes visualization HICC widget
which reads data from RDBMS

View raw message