chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "michael yu (JIRA)" <>
Subject [jira] [Commented] (CHUKWA-680) Pattern recognition of Hadoop generated metrics
Date Fri, 27 Dec 2013 03:37:52 GMT


michael yu commented on CHUKWA-680:

Hi Otis,

I may have no included a screenshot of the accuracy.  You can reference Chapter 6 Performance
and Benchmarks.  From all of my testing for my provided data set, I recall the accuracy being
anywhere between 95% to 100%.

In general, the larger the data set you feed to SVM, the better (and more accurate) the training

Unfortunately, the code was implemented in such a way specific to querying and parsing the
metrics data from HBase in a Hadoop environment.  The code can (and should) be refactored
and generalized to process metrics from different datasource types.

> Pattern recognition of Hadoop generated metrics
> -----------------------------------------------
>                 Key: CHUKWA-680
>                 URL:
>             Project: Chukwa
>          Issue Type: New Feature
>          Components: Data Collection
>         Environment: IBM InfoSphere BigInsights Enterprise
>            Reporter: michael yu
>            Assignee: michael yu
>            Priority: Minor
>              Labels: GSoC, GSoC2013
>         Attachments: Yu, Michael et al-project-report-draft.pdf
>   Original Estimate: 2,760h
>  Remaining Estimate: 2,760h
> Charles Lin and I are working on our IBM SJSU masters project on "Pattern recognition
of Hadoop generated metrics".
> The purpose of the project is to use libsvm to predict the health of the cluster.
> The scope of the project includes:
> 1) gathering large scale data set of metrics for healthy and unhealthy clusters
> 2) use #1 and libsvm to generate training model
> 3) periodic collection of metrics and comparing against training model using libsvm to
predict the cluster health
>    a) if unhealthy, send email notification to system administrator 

This message was sent by Atlassian JIRA

View raw message