incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <omal...@apache.org>
Subject [DISCUSS] Metron incubator proposal
Date Mon, 30 Nov 2015 16:55:38 GMT
Hi all,

 We'd like to start a discussion proposing creating Metron as an incubator
podling. The proposal is on the wiki here:
https://wiki.apache.org/incubator/MetronProposal

I would call your attention to the background section in particular. The
condensed version is that the original code base (OpenSOC) was created by a
company (Cisco) that put it on github as ALv2, but then hasn't been working
on it. We posted a message
<https://groups.google.com/d/msg/opensoc-support/rFlW2uSSvmU/Sw_cO-T2AAAJ>
to the OpenSOC support group a month ago proposing a move to Apache and got
a single positive response.

The text of the proposal is included below for easy quoting during
discussion.

Thanks,
   Owen

= Apache Metron Proposal =

== Abstract ==

The Metron project is an open source project dedicated to providing an
extensible and scalable advanced security analytics tool. It has strong
foundations in the Apache Hadoop ecosystem.

== Proposal ==

Metron integrates a variety of open source big data technologies in order
to offer a centralized tool for security monitoring and analysis. Metron
provides capabilities for log aggregation, full packet capture indexing,
storage, advanced behavioral analytics and data enrichment, while applying
the most current threat-intelligence information to security telemetry
within a single platform.

Metron can be divided into 4 areas:

  1. '''A mechanism to capture, store, and normalize any type of security
telemetry at extremely high rates.''' Because security telemetry is
constantly being generated, it requires a method for ingesting the data at
high speeds and pushing it to various processing units for advanced
computation and analytics.
  1. '''Real time processing and application of enrichments''' such as
threat intelligence, geolocation, and DNS information to telemetry being
collected. The immediate application of this information to incoming
telemetry provides the context and situational awareness, as well as the
“who” and “where” information that is critical for investigation.
  1. '''Efficient information storage''' based on how the information will
be used:
    a. Logs and telemetry are stored such that they can be efficiently
mined and analyzed for concise security visibility
    a. The ability to extract and reconstruct full packets helps an analyst
answer questions such as who the true attacker was, what data was leaked,
and where that data was sent
    a. Long-term storage not only increases visibility over time, but also
enables advanced analytics such as machine learning techniques to be used
to create models on the information. Incoming data can then be scored
against these stored models for advanced anomaly detection.
  1. '''An interface that gives a security investigator a centralized view
of data and alerts passed through the system.''' Metron’s interface
presents alert summaries with threat intelligence and enrichment data
specific to that alert on one single page. Furthermore, advanced search
capabilities and full packet extraction tools are presented to the analyst
for investigation without the need to pivot into additional tools.

Big data is a natural fit for powerful security analytics. The Metron
framework integrates a number of elements from the Hadoop ecosystem to
provide a scalable platform for security analytics, incorporating such
functionality as full-packet capture, stream processing, batch processing,
real-time search, and telemetry aggregation. With Metron, our goal is to
tie big data into security analytics and drive towards an extensible
centralized platform to effectively enable rapid detection and rapid
response for advanced security threats.

== Background ==

OpenSOC was developed by Cisco over the last two years and pushed out to
Github (https://github.com/OpenSOC/opensoc) under the ALv2. However, the
development was mostly closed and has largely stopped. As evidence of the
inactivity, users have complained that pull requests are not answered for a
while
https://groups.google.com/d/msg/opensoc-support/R2W-ZFux8Vk/Y-5tL-EmAAAJ.
Finally, no public releases of OpenSOC have been made. From an Apache point
of view, the current community is not viable.

However, some of the developers of the project have left Cisco and have
found interest from several others that would like to work together to form
an active and open community at Apache starting from the current OpenSOC
code base. A message to the current support group proposing moving to
Apache got a single positive response.
https://groups.google.com/d/msg/opensoc-support/rFlW2uSSvmU/09PIsWL4AAAJ

Because Cisco is not currently interested in being involved, the project
expects to change their name. The project would like to use Metron,
although we will perform a podling name search to check for conflicts.
Metron, meaning measure, is half of the greek root for the word
'telemetry.'  Metron is also a DC Comics character who “... wanders in
search of greater knowledge beyond his own”.


== Rationale ==
Metron strives to move the state of the art in security analytics forward.
We want to move away from the proprietary nature of legacy security point
tools and develop an open platform where people can contribute and share
datasets, machine learning models, telemetry parsers, sources of telemetry
enrichment, and threat intelligence feeds.  Cyber security is too large of
a problem for a single corporation to tackle on its own and the current
tooling is too fragmented and proprietary for us to be able to rally around
a single tool or vendor.

In addition to being open and facilitating advancement in security
analytics, Metron has several advantages over a conventional Security
Information Management System (SIEM).

  * Metron uses all open source stack under the hood and runs on commodity
hardware.  This means Metron is much cheaper to run then the competition.
In security cost plays a major factor because the cost of your
countermeasure for monitoring and reacting to a threat should not exceed
the cost of what is being protected.  By driving down the cost of security
the economics works for more assets to be monitored, which means more
secure data centers.
  * Metron, being in the open, allows additional vetting and scrutiny by
the open source community for all of its components.  This is a better
model for a security-oriented tool than doing it closed source.  All the
problems should be flushed out and fixed in the open. The closed source
competition does not have this kind of rigor, is motivated by marketing and
sales, and thus, does not inspire confidence when it comes to security.
  * Being Hadoop-based, Metron can process unprecedented volumes of
streaming data via Apache Storm.  When an organization is hit with malware
or malicious behavior most commonly this happens as a part of a global
malware campaign, signatures for which are known and are available from
third party threat intelligence feeds.  Having the ability to take in all
the feeds and reference them against every telemetry message processed by
Metron in real time does not only facilitate detection of such campaigns,
it changes the economics for the “bad guys”.  If you have to customize your
malware for each of your targets these global attacks become a lot more
expensive and non viable for them.
  * Metron strives to shift conventional SOC workflows away from being
rules-driven to a more data-driven approach that incorporates machine
learning and a higher degree of automation and autonomous detection.  The
modern threat landscape is too dynamic to be manageable via static rules
alone, which is what conventional SIEMs rely on.  Rule bases tend to bloat,
and if improperly maintained turn themselves into sources of false positive
alerts.

The ability to analyze and model large volumes of data at rest and then
being able to push up the output of that into a stream processor is
essential in disrupting the

== Current Status ==

As stated in the background section, the current community isn’t healthy,
which is why we are proposing moving to Apache Incubator. In this section,
we will describe the current state of the OpenSOC project.

=== Meritocracy ===
The OpenSOC development is controlled by Cisco and pull requests are being
ignored. The development list is private and requests to join are rejected
because there is no activity on it. The goal of moving to Apache is to form
a meritocracy where a variety of individuals, regardless of their current
employer, come together and work together. We understand that diversity,
open development, and open governance are critical to being a successful
Apache project.

=== Community ===
The OpenSOC project is not responding to pull requests or making releases.
The easiest solution would be to create a variety of forks of the project
on github, but that would further fracture the community and prevent it
from reaching critical mass. Our prefered solution is to build a single
large diverse and open community at Apache.

=== Core Developers ===
The core developers of Metron are James Sirota, Charles Porter, and Mark
Bittmann. None of them have experience running an open source project, but
they are eager to learn.

=== Alignment ===
The ASF is a natural host for Metron given that it is already the home of
Hadoop, HBase, Hive, Storm, Kafka, Spark and other emerging big data
projects. Metron leverages many of Apache open-source products. We are very
interested in a place to develop our community and integrations with the
other Apache big data projects.

== Known Risks ==

=== Orphaned Products ===

The current product developers are all salaried developers at a small
number of companies and thus there is a risk of becoming an orphaned
product. However, the companies view Metron as very important to their
product offering and plan to ramp up their work in the space. The project
is unique in the product space and thus has strong potential to become a
sustainable community.

=== Inexperience with Open Source ===
The vast majority of the developers are inexperienced with open source
development and the Apache Way. One of the major hurdles to graduation from
the Apache Incubator will be demonstrating that they have learned the
Apache Way and are applying it to how the project is managed. Vinod Kumar
Vavilapalli is an Apache Member and plans on actively working as a
committer in the project. They also have the other mentors to help them
learn as they progress.

=== Homogenous Developers ===
The developers are employed by four diverse companies (B23, Hortonworks,
Mantech, and Rackspace), They are distributed across the United States. We
hope to attract additional diversity as an Apache project.

=== Reliance on Salaried Developers ===
Metron is currently being developed exclusively by salaried developers, but
the goal of coming to Apache is to form a community of users and developers
that is much more diverse including non-salaried developers.

=== Relationships with Other Apache Products ===
Metron has a strong relationship and dependency with Apache Flume, Hadoop,
HBase, Hive, Kafka, Spark, and Storm. Being part of Apache’s Incubation
community could help with a closer collaboration among these projects and
as well as others.

We note that although there is a superficial resemblance to Apache Eagle,
which does security analysis of Hadoop audit events, the projects are
significantly different. In particular, Metron is focused on analyzing
network packet traffic and thus has a very different scope and scale of
events than Eagle.

=== An Excessive Fascination with the Apache Brand ===

While the Apache brand is important, we are much more interested in finding
a home for the project that encourages open development and open
governance. We want to form the new community using the Apache Way with its
strong focus on meritocracy, organizational independence, and open
development.

== Documentation ==
The current information on the OpenSOC project is here:
http://opensoc.github.io/
A slide deck presenting background material is here:
http://www.slideshare.net/JamesSirota/cisco-opensoc

== Initial Source ==
The initial code is on github:  http://opensoc.github.io/

== External Dependencies ==
Metron has the following external dependencies:
  * Apache Flume
  * Apache Hadoop
  * Apache HBase
  * Apache Hive
  * Apache Kafka
  * Apache Spark
  * Apache Storm
  * ElasticSearch
  * MySQL

The project understands that it will need to support alternatives for MySQL
that are licensed under a ALv2 compatible license.

== Cryptography ==
Metron will eventually support encryption on the wire, but this is not one
of the initial goals, and we do not expect Metron to be a controlled export
item due to the use of encryption. Metron supports but does not require the
Kerberos authentication mechanism to access secured Hadoop services.

== Required Resources ==

=== Mailing List ===

  * metron-private for private PMC discussions
  * metron-dev for developers
  * metron-commits for all commits
  * metron-users for all users

=== Version Control ===
Git is the preferred source control system.

=== Issue Tracking ===

  * JIRA (METRON)

=== Other Resources ===
The existing code already has unit tests so we will make use of existing
Apache continuous testing infrastructure. The resulting load should not be
very large.

== Initial Committers ==
  * Jim Baker < jim.baker at rackspace dot com >
  * Mark Bittmann < mark at b23 dot io >
  * Sheetal Dolas < sheetal at hortonworks dot com >
  * Discovery Gerdes < discovery.gerdes at rackspace dot com >
  * Andrew Hartnett < andrew.hartnett at rackspace dot com >
  * Dave Hirko < dave at b23 dot io >
  * Paul Kehrer < paul.kehrer at rackspace dot com >
  * Brad Kolarov < brad at b23 dot io >
  * Kiran Komaravolu <kkomaravolu at hortonworks dot com >
  * Ryan Merriman < rmerriman at hortonworks dot com >
  * Michael Perez <michael.perez at hortonworks dot com>
  * Charles Porter <Charles.Porter at mcs dot mantech dot com >
  * Sean Schulte < sean.schulte at rackspace dot com >
  * James Sirota < jsirota at hortonworks dot com >
  * Casey Stella < cstella at hortonworks dot com >
  * Bryan Taylor < bryan.taylor at rackspace dot com >
  * Ray Urciuoli < Ray.Urciuoli at mcs dot mantech dot com >
  * Vinod Kumar Vavilapalli < vinodkv at apache dot org >
  * George Vetticaden < gvetticaden at hortonworks dot com >
  * Oskar Zabik < oskar.zabik at rackspace dot com >

== Affiliations ==
The initial committers are employees of:
  * Jim Baker - Rackspace
  * Mark Bittmann - B23
  * Sheetal Dolas - Hortonworks
  * Discovery Gerdes - Rackspace
  * Andrew Hartnett - Rackspace
  * Dave Hirko - B23
  * Paul Kehrer - Rackspace
  * Brad Kolarov - B23
  * Kiran Komaravolu - Hortonworks
  * Ryan Merriman - Hortonworks
  * Michael Perez - Hortonworks
  * Charles Porter - Mantech
  * Sean Schulte - Rackspace
  * James Sirota - Hortonworks
  * Casey Stella - Hortonworks
  * Bryan Taylor - Rackspace
  * Ray Urciuoli - Mantech
  * Vinod Kumar Vavilapalli - Hortonworks
  * George Vetticaden - Hortonworks
  * Oskar Zabik - Rackspace

== Sponsors ==

=== Champion ===
  * Owen O’Malley - Apache IPMC member

=== Nominated Mentors ===
  * Chris Mattmann <mattmann at apache dot org > - Apache IPMC member, NASA
  * Owen O’Malley <omalley at apache dot org > - Apache IPMC member,
Hortonworks
  * Billie Rinaldi < billie at apache dot org > - Apache IPMC member,
Hortonworks
  * Vinod Kumar Vavilapalli < vinodkv at apache dot org > - Apache IPMC
member, Hortonworks

=== Sponsoring Entity ===
We are requesting the Incubator to sponsor this project.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message