Return-Path: Delivered-To: apmail-hadoop-core-commits-archive@www.apache.org Received: (qmail 52619 invoked from network); 11 Jun 2009 07:50:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 11 Jun 2009 07:50:40 -0000 Received: (qmail 3457 invoked by uid 500); 11 Jun 2009 07:50:52 -0000 Delivered-To: apmail-hadoop-core-commits-archive@hadoop.apache.org Received: (qmail 3368 invoked by uid 500); 11 Jun 2009 07:50:52 -0000 Mailing-List: contact core-commits-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-commits@hadoop.apache.org Received: (qmail 3359 invoked by uid 99); 11 Jun 2009 07:50:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Jun 2009 07:50:51 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Jun 2009 07:50:40 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 4BDAF118C0 for ; Thu, 11 Jun 2009 07:50:18 +0000 (GMT) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Apache Wiki To: core-commits@hadoop.apache.org Date: Thu, 11 Jun 2009 07:50:17 -0000 Message-ID: <20090611075018.23805.46583@eos.apache.org> Subject: [Hadoop Wiki] Update of "Anomaly Detection Framework with Chukwa" by JiaqiTan X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The following page has been changed by JiaqiTan: http://wiki.apache.org/hadoop/Anomaly_Detection_Framework_with_Chukwa ------------------------------------------------------------------------------ Hence, the overall workflow of the anomaly detection would be as follows: - 1. MapReduce/Pig job processes post-Demux output to generate abstract view and/or anomaly detection output + 1. MapReduce/Pig job processes post-Demux output to generate abstract view and/or anomaly detection output "(these would be scheduled by the PostProcessor, the PostProcessor would serve as the entry-point to the execution logic of the anomaly detection framework)" 1. (Optional) Additional MapReduce/Pig job processes abstract views/anomaly detection output to generate secondary anomaly detection output 1. Data fed into HICC via an RDBMS 1. HICC widget loads anomaly detection/abstract view data from RDBMS for visualization @@ -31, +31 @@ === Hadoop anomaly detection and behavioral visualization === - Current active developments for the Chukwa Anomaly Detection Framework are for detecting anomalies in Hadoop based on the following tools/concepts: + Current active developments for the Chukwa Anomaly Detection Framework are for detecting anomalies in Hadoop based on the following tools/concepts from the CMU [http://www.ece.cmu.edu/~fingerpointing/ Fingerpointing project]: 1. [http://www.usenix.org/event/wasl08/tech/full_papers/tan/tan_html/ SALSA] for State-machine extraction of Hadoop's behavior from its logs 1. [http://www.usenix.org/event/wasl08/tech/full_papers/tan/tan_html/ SALSA] for Hadoop task-based anomaly detection using Hadoop's logs 1. [http://www.pdl.cmu.edu/PDL-FTP/stray/mochi_tan_hotcloud09_abs.html Mochi] for visualization of Hadoop's behavior (Swimlanes plots, MIROS heatmaps of aggregate data-flow) and extraction of causal job-centric data-flows (JCDF) 1. [http://www.pdl.cmu.edu/PDL-FTP/stray/CMU-PDL-08-112_abs.html Ganesha] for node-based anomaly detection using OS-collected black-box metrics - The workflow is as follows (class names, if available, + status listed in square brackets): + The {{{FSMBuilder}}} component implements SALSA state-machine extraction, and is a MapReduce job which reads SequenceFiles of ChukwaRecords and outputs SequenceFiles of ChukwaRecords, with each ChukwaRecord storing a single state. We describe the workflows for some of the tools below: + ==== Swimlanes Visualization ==== - 1. ({{{FSMBuilder}}}, available soon) SALSA is used to extract state-machine views of Hadoop's execution - uses post-Demux output; uses {{{JobData/JobHistory}}},{{{ClientTrace}}},and TaskTracker-generated {{{userlogs}}} - 1. Anomaly detection MapReduce program reads in state-machine data generated from {{{FSMBuilder}}} to generate anomaly alerts. - 1. (CHUKWA-279) State-machine data from {{{FSMBuilder}}} is loaded into RDBMS using MDL - 1. (CHUKWA-279) Raw state-machine views visualized using Swimlanes visualization HICC widget which reads data from RDBMS + This visualization shows the detailed task-level progress of MapReduce jobs across nodes in the cluster. + + 1. ({{{FSMBuilder}}} MapReduce job, available soon) SALSA is used to extract state-machine views of Hadoop's execution - uses post-Demux output; uses {{{JobData/JobHistory}}} + 1. ([http://issues.apache.org/jira/browse/CHUKWA-279 CHUKWA-279]) State-machine data from {{{FSMBuilder}}} is loaded into RDBMS using MDL + 1. ([http://issues.apache.org/jira/browse/CHUKWA-279 CHUKWA-279]) Raw state-machine views visualized using Swimlanes visualization HICC widget which reads data from RDBMS + + + ==== MIROS (N x N heatmaps) Visualization ==== + + This visualization shows the aggregate data-flows across DataNodes in an HDFS instance. + + 1. ({{{FSMBuilder}}}, available soon) SALSA is used to extract state-machine views of Hadoop's execution - uses post-Demux output; uses {{{ClientTraceDetailed}}} ([http://issues.apache.org/jira/browse/CHUKWA-282 CHUKWA-282]) + 1. (MapReduce job to aggregate HDFS activity across states) Process states as seen from state-machine view to generate aggregate counts of HDFS activity + 1. Aggregate activity data is loaded into RDBMS using MDL + 1. Visualization of heatmaps using HICC widget + + ==== Task-based Anomaly Detection ==== + + + + 1. ({{{FSMBuilder}}}, available soon) SALSA is used to extract state-machine views of Hadoop's execution - uses post-Demux output; uses {{{JobData/JobHistory}}} + 1. (MapReduce job) Collect states from state-machine view and process them to generate list of anomalous nodes, possibly list of anomalous nodes per unit time for incremental/online diagnosis of anomalies + 1. Load anomaly data into RDBMS using MDL + 1. Visualization of heatmaps using HICC widget + +