incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <omal...@apache.org>
Subject Re: [VOTE] Accept Eagle into Apache Incubation
Date Fri, 23 Oct 2015 15:50:54 GMT
+1 (binding)

On Fri, Oct 23, 2015 at 8:42 AM, wp chun <wp_chun@hotmail.com> wrote:

> +1
> wp_chun@hotmail.com
> >
> > On 10/23/15, 11:26 PM, "P. Taylor Goetz" <ptgoetz@gmail.com> wrote:
> >
> > >+1 (binding)
> > >
> > >-Taylor
> > >
> > >> On Oct 23, 2015, at 10:11 AM, Manoharan, Arun <armanoharan@ebay.com>
> > >>wrote:
> > >>
> > >> Hello Everyone,
> > >>
> > >> Thanks for all the feedback on the Eagle Proposal.
> > >>
> > >> I would like to call for a [VOTE] on Eagle joining the ASF as an
> > >>incubation project.
> > >>
> > >> The vote is open for 72 hours:
> > >>
> > >> [ ] +1 accept Eagle in the Incubator
> > >> [ ] ±0
> > >> [ ] -1 (please give reason)
> > >>
> > >> Eagle is a Monitoring solution for Hadoop to instantly identify access
> > >>to sensitive data, recognize attacks, malicious activities and take
> > >>actions in real time. Eagle supports a wide variety of policies on HDFS
> > >>data and Hive. Eagle also provides machine learning models for
> detecting
> > >>anomalous user behavior in Hadoop.
> > >>
> > >> The proposal is available on the wiki here:
> > >> https://wiki.apache.org/incubator/EagleProposal
> > >>
> > >> The text of the proposal is also available at the end of this email.
> > >>
> > >> Thanks for your time and help.
> > >>
> > >> Thanks,
> > >> Arun
> > >>
> > >> <COPY of the proposal in text format>
> > >>
> > >> Eagle
> > >>
> > >> Abstract
> > >> Eagle is an Open Source Monitoring solution for Hadoop to instantly
> > >>identify access to sensitive data, recognize attacks, malicious
> > >>activities in hadoop and take actions.
> > >>
> > >> Proposal
> > >> Eagle audits access to HDFS files, Hive and HBase tables in real time,
> > >>enforces policies defined on sensitive data access and alerts or blocks
> > >>user¹s access to that sensitive data in real time. Eagle also creates
> > >>user profiles based on the typical access behaviour for HDFS and Hive
> > >>and sends alerts when anomalous behaviour is detected. Eagle can also
> > >>import sensitive data information classified by external classification
> > >>engines to help define its policies.
> > >>
> > >> Overview of Eagle
> > >> Eagle has 3 main parts.
> > >> 1.Data collection and storage - Eagle collects data from various
> hadoop
> > >>logs in real time using Kafka/Yarn API and uses HDFS and HBase for
> > >>storage.
> > >> 2.Data processing and policy engine - Eagle allows users to create
> > >>policies based on various metadata properties on HDFS, Hive and HBase
> > >>data.
> > >> 3.Eagle services - Eagle services include policy manager, query
> service
> > >>and the visualization component. Eagle provides intuitive user
> interface
> > >>to administer Eagle and an alert dashboard to respond to real time
> > >>alerts.
> > >>
> > >> Data Collection and Storage:
> > >> Eagle provides programming API for extending Eagle to integrate any
> > >>data source into Eagle policy evaluation framework. For example, Eagle
> > >>hdfs audit monitoring collects data from Kafka which is populated from
> > >>namenode log4j appender or from logstash agent. Eagle hive monitoring
> > >>collects hive query logs from running job through YARN API, which is
> > >>designed to be scalable and fault-tolerant. Eagle uses HBase as storage
> > >>for storing metadata and metrics data, and also supports relational
> > >>database through configuration change.
> > >>
> > >> Data Processing and Policy Engine:
> > >> Processing Engine: Eagle provides stream processing API which is an
> > >>abstraction of Apache Storm. It can also be extended to other streaming
> > >>engines. This abstraction allows developers to assemble data
> > >>transformation, filtering, external data join etc. without physically
> > >>bound to a specific streaming platform. Eagle streaming API allows
> > >>developers to easily integrate business logic with Eagle policy engine
> > >>and internally Eagle framework compiles business logic execution DAG
> > >>into program primitives of underlying stream infrastructure e.g. Apache
> > >>Storm. For example, Eagle HDFS monitoring transforms audit log from
> > >>Namenode to object and joins sensitivity metadata, security zone
> > >>metadata which are generated from external programs or configured by
> > >>user. Eagle hive monitoring filters running jobs to get hive query
> > >>string and parses query string into object and then joins sensitivity
> > >>metadata.
> > >> Alerting Framework: Eagle Alert Framework includes stream metadata
> API,
> > >>scalable policy engine framework, extensible policy engine framework.
> > >>Stream metadata API allows developers to declare event schema including
> > >>what attributes constitute an event, what is the type for each
> > >>attribute, and how to dynamically resolve attribute value in runtime
> > >>when user configures policy. Scalable policy engine framework allows
> > >>policies to be executed on different physical nodes in parallel. It is
> > >>also used to define your own policy partitioner class. Policy engine
> > >>framework together with streaming partitioning capability provided by
> > >>all streaming platforms will make sure policies and events can be
> > >>evaluated in a fully distributed way. Extensible policy engine
> framework
> > >>allows developer to plugin a new policy engine with a few lines of
> > >>codes. WSO2 Siddhi CEP engine is the policy engine which Eagle supports
> > >>as first-class citizen.
> > >> Machine Learning module: Eagle provides capabilities to define user
> > >>activity patterns or user profiles for Hadoop users based on the user
> > >>behaviour in the platform. These user profiles are modeled using
> Machine
> > >>Learning algorithms and used for detection of anomalous users
> > >>activities. Eagle uses Eigen Value Decomposition, and Density
> Estimation
> > >>algorithms for generating user profile models. The model reads data
> from
> > >>HDFS audit logs, preprocesses and aggregates data, and generates models
> > >>using Spark programming APIs. Once models are generated, Eagle uses
> > >>stream processing engine for near real-time anomaly detection to
> > >>determine if any user¹s activities are suspicious or not.
> > >>
> > >> Eagle Services:
> > >> Query Service: Eagle provides SQL-like service API to support
> > >>comprehensive computation for huge set of data on the fly, for e.g.
> > >>comprehensive filtering, aggregation, histogram, sorting, top,
> > >>arithmetical expression, pagination etc. HBase is the data storage
> which
> > >>Eagle supports as first-class citizen, relational database is supported
> > >>as well. For HBase storage, Eagle query framework compiles user
> provided
> > >>SQL-like query into HBase native filter objects and execute it through
> > >>HBase coprocessor on the fly.
> > >> Policy Manager: Eagle policy manager provides UI and Restful API for
> > >>user to define policy with just a few clicks. It includes site
> > >>management UI, policy editor, sensitivity metadata import, HDFS or Hive
> > >>sensitive resource browsing, alert dashboards etc.
> > >> Background
> > >> Data is one of the most important assets for today¹s businesses, which
> > >>makes data security one of the top priorities of today¹s enterprises.
> > >>Hadoop is widely used across different verticals as a big data
> > >>repository to store this data in most modern enterprises.
> > >> At eBay we use hadoop platform extensively for our data processing
> > >>needs. Our data in Hadoop is becoming bigger and bigger as our user
> base
> > >>is seeing an exponential growth. Today there are variety of data sets
> > >>available in Hadoop cluster for our users to consume. eBay has around
> > >>120 PB of data stored in HDFS across 6 different clusters and around
> > >>1800+ active hadoop users consuming data thru Hive, HBase and mapreduce
> > >>jobs everyday to build applications using this data. With this
> > >>astronomical growth of data there are also challenges in securing
> > >>sensitive data and monitoring the access to this sensitive data. Today
> > >>in large organizations HDFS is the defacto standard for storing big
> > >>data. Data sets which includes and not limited to consumer sentiment,
> > >>social media data, customer segmentation, web clicks, sensor data,
> > >>geo-location and transaction data get stored in Hadoop for day to day
> > >>business needs.
> > >> We at eBay want to make sure the sensitive data and data platforms are
> > >>completely protected from security breaches. So we partnered very
> > >>closely with our Information Security team to understand the
> > >>requirements for Eagle to monitor sensitive data access on hadoop:
> > >> 1.Ability to identify and stop security threats in real time
> > >> 2.Scale for big data (Support PB scale and Billions of events)
> > >> 3.Ability to create data access policies
> > >> 4.Support multiple data sources like HDFS, HBase, Hive
> > >> 5.Visualize alerts in real time
> > >> 6.Ability to block malicious access in real time
> > >> We did not find any data access monitoring solution that available
> > >>today and can provide the features and functionality that we need to
> > >>monitor the data access in the hadoop ecosystem at our scale. Hence
> with
> > >>an excellent team of world class developers and several users, we have
> > >>been able to bring Eagle into production as well as open source it.
> > >>
> > >> Rationale
> > >> In today¹s world; data is an important asset for any company.
> > >>Businesses are using data extensively to create amazing experiences for
> > >>users. Data has to be protected and access to data should be secured
> > >>from security breaches. Today Hadoop is not only used to store logs but
> > >>also stores financial data, sensitive data sets, geographical data,
> user
> > >>click stream data sets etc. which makes it more important to be
> > >>protected from security breaches. To secure a data platform there are
> > >>multiple things that need to happen. One is having a strong access
> > >>control mechanism which today is provided by Apache Ranger and Apache
> > >>Sentry. These tools provide the ability to provide fine grain access
> > >>control mechanism to data sets on hadoop. But there is a big gap in
> > >>terms of monitoring all the data access events and activities in order
> > >>to securing the hadoop data platform. Together with strong access
> > >>control, perimeter security and data access monitoring in place data in
> > >>the hadoop clusters can be secured against breaches. We looked around
> > >>and found following:
> > >> Existing data activity monitoring products are designed for
> traditional
> > >>databases and data warehouse. Existing monitoring platforms cannot
> scale
> > >>out to support fast growing data and petabyte scale. Few products in
> the
> > >>industry are still very early in terms of supporting HDFS, Hive, HBase
> > >>data access monitoring.
> > >> As mentioned in the background, the business requirement and urgency
> to
> > >>secure the data from users with malicious intent drove eBay to invest
> in
> > >>building a real time data access monitoring solution from scratch to
> > >>offer real time alerts and remediation features for malicious data
> > >>access.
> > >> With the power of open source distributed systems like Hadoop, Kafka
> > >>and much more we were able to develop a data activity monitoring system
> > >>that can scale, identify and stop malicious access in real time.
> > >> Eagle allows admins to create standard access policies and rules for
> > >>monitoring HDFS, Hive and HBase data. Eagle also provides out of box
> > >>machine learning models for modeling user profiles based on user access
> > >>behaviour and use the model to alert on anomalies.
> > >>
> > >> Current Status
> > >>
> > >> Meritocracy
> > >> Eagle has been deployed in production at eBay for monitoring billions
> > >>of events per day from HDFS and Hive operations. From the start; the
> > >>product has been built with focus on high scalability and application
> > >>extensibility in mind and Eagle has demonstrated great performance in
> > >>responding to suspicious events instantly and great flexibility in
> > >>defining policy.
> > >>
> > >> Community
> > >> Eagle seeks to develop the developer and user communities during
> > >>incubation.
> > >>
> > >> Core Developers
> > >> Eagle is currently being designed and developed by engineers from eBay
> > >>Inc. ­ Edward Zhang, Hao Chen, Chaitali Gupta, Libin Sun, Jilin Jiang,
> > >>Qingwen Zhao, Senthil Kumar, Hemanth Dendukuri, Arun Manoharan. All of
> > >>these core developers have deep expertise in developing monitoring
> > >>products for the Hadoop ecosystem.
> > >>
> > >> Alignment
> > >> The ASF is a natural host for Eagle given that it is already the home
> > >>of Hadoop, HBase, Hive, Storm, Kafka, Spark and other emerging big data
> > >>projects. Eagle leverages lot of Apache open-source products. Eagle was
> > >>designed to offer real time insights into sensitive data access by
> > >>actively monitoring the data access on various data sets in hadoop and
> > >>an extensible alerting framework with a powerful policy engine. Eagle
> > >>compliments the existing Hadoop platform area by providing a
> > >>comprehensive monitoring and alerting solution for detecting sensitive
> > >>data access threats based on preset policies and machine learning
> models
> > >>for user behaviour analysis.
> > >>
> > >> Known Risks
> > >>
> > >> Orphaned Products
> > >> The core developers of Eagle team work full time on this project.
> There
> > >>is no risk of Eagle getting orphaned since eBay is extensively using it
> > >>in their production Hadoop clusters and have plans to go beyond hadoop.
> > >>For example, currently there are 7 hadoop clusters and 2 of them are
> > >>being monitored using Hadoop Eagle in production. We have plans to
> > >>extend it to all hadoop clusters and eventually other data platforms.
> > >>There are 10¹s of policies onboarded and actively monitored with plans
> > >>to onboard more use case. We are very confident that every hadoop
> > >>cluster in the world will be monitored using Eagle for securing the
> > >>hadoop ecosystem by actively monitoring for data access on sensitive
> > >>data. We plan to extend and diversify this community further through
> > >>Apache. We presented Eagle at the hadoop summit in china and garnered
> > >>interest from different companies who use hadoop extensively.
> > >>
> > >> Inexperience with Open Source
> > >> The core developers are all active users and followers of open source.
> > >>They are already committers and contributors to the Eagle Github
> > >>project. All have been involved with the source code that has been
> > >>released under an open source license, and several of them also have
> > >>experience developing code in an open source environment. Though the
> > >>core set of Developers do not have Apache Open Source experience, there
> > >>are plans to onboard individuals with Apache open source experience on
> > >>to the project. Apache Kylin PMC members are also in the same ebay
> > >>organization. We work very closely with Apache Ranger committers and
> are
> > >>looking forward to find meaningful integrations to improve the security
> > >>of hadoop platform.
> > >>
> > >> Homogenous Developers
> > >> The core developers are from eBay. Today the problem of monitoring
> data
> > >>activities to find and stop threats is a universal problem faced by all
> > >>the businesses. Apache Incubation process encourages an open and
> diverse
> > >>meritocratic community. Eagle intends to make every possible effort to
> > >>build a diverse, vibrant and involved community and has already
> received
> > >>substantial interest from various organizations.
> > >>
> > >> Reliance on Salaried Developers
> > >> eBay invested in Eagle as the monitoring solution for Hadoop clusters
> > >>and some of its key engineers are working full time on the project. In
> > >>addition, since there is a growing need for securing sensitive data
> > >>access we need a data activity monitoring solution for Hadoop, we look
> > >>forward to other Apache developers and researchers to contribute to the
> > >>project. Additional contributors, including Apache committers have
> plans
> > >>to join this effort shortly. Also key to addressing the risk associated
> > >>with relying on Salaried developers from a single entity is to increase
> > >>the diversity of the contributors and actively lobby for Domain experts
> > >>in the security space to contribute. Eagle intends to do this.
> > >>
> > >> Relationships with Other Apache Products
> > >> Eagle has a strong relationship and dependency with Apache Hadoop,
> > >>HBase, Spark, Kafka and Storm. Being part of Apache¹s Incubation
> > >>community, could help with a closer collaboration among these projects
> > >>and as well as others. An Excessive Fascination with the Apache Brand
> > >>Eagle is proposing to enter incubation at Apache in order to help
> > >>efforts to diversify the committer-base, not so much to capitalize on
> > >>the Apache brand. The Eagle project is in production use already inside
> > >>eBay, but is not expected to be an eBay product for external customers.
> > >>As such, the Eagle project is not seeking to use the Apache brand as a
> > >>marketing tool.
> > >>
> > >> Documentation
> > >> Information about Eagle can be found at https://github.com/eBay/Eagle
> .
> > >>The following link provide more information about Eagle
> > >>http://goeagle.io<http://goeagle.io/>.
> > >>
> > >> Initial Source
> > >> Eagle has been under development since 2014 by a team of engineers at
> > >>eBay Inc. It is currently hosted on Github.com under an Apache license
> > >>2.0 at https://github.com/eBay/Eagle. Once in incubation we will be
> > >>moving the code base to apache git library.
> > >>
> > >> External Dependencies
> > >> Eagle has the following external dependencies.
> > >> Basic
> > >> €JDK 1.7+
> > >> €Scala 2.10.4
> > >> €Apache Maven
> > >> €JUnit
> > >> €Log4j
> > >> €Slf4j
> > >> €Apache Commons
> > >> €Apache Commons Math3
> > >> €Jackson
> > >> €Siddhi CEP engine
> > >>
> > >> Hadoop
> > >> €Apache Hadoop
> > >> €Apache HBase
> > >> €Apache Hive
> > >> €Apache Zookeeper
> > >> €Apache Curator
> > >>
> > >> Apache Spark
> > >> €Spark Core Library
> > >>
> > >> REST Service
> > >> €Jersey
> > >>
> > >> Query
> > >> €Antlr
> > >>
> > >> Stream processing
> > >> €Apache Storm
> > >> €Apache Kafka
> > >>
> > >> Web
> > >> €AngularJS
> > >> €jQuery
> > >> €Bootstrap V3
> > >> €Moment JS
> > >> €Admin LTE
> > >> €html5shiv
> > >> €respond
> > >> €Fastclick
> > >> €Date Range Picker
> > >> €Flot JS
> > >>
> > >> Cryptography
> > >> Eagle will eventually support encryption on the wire. This is not one
> > >>of the initial goals, and we do not expect Eagle to be a controlled
> > >>export item due to the use of encryption. Eagle supports but does not
> > >>require the Kerberos authentication mechanism to access secured Hadoop
> > >>services.
> > >>
> > >> Required Resources
> > >>
> > >> Mailing List
> > >> €eagle-private for private PMC discussions
> > >> €eagle-dev for developers
> > >> €eagle-commits for all commits
> > >> €eagle-users for all eagle users
> > >>
> > >> Subversion Directory
> > >> €Git is the preferred source control system.
> > >>
> > >> Issue Tracking
> > >> €JIRA Eagle (Eagle)
> > >>
> > >> Other Resources
> > >> The existing code already has unit tests so we will make use of
> > >>existing Apache continuous testing infrastructure. The resulting load
> > >>should not be very large.
> > >>
> > >> Initial Committers
> > >> €Seshu Adunuthula <sadunuthula at ebay dot com>
> > >> €Arun Manoharan <armanoharan at ebay dot com>
> > >> €Edward Zhang <yonzhang at ebay dot com>
> > >> €Hao Chen <hchen9 at ebay dot com>
> > >> €Chaitali Gupta <cgupta at ebay dot com>
> > >> €Libin Sun <libsun at ebay dot com>
> > >> €Jilin Jiang <jiljiang at ebay dot com>
> > >> €Qingwen Zhao <qingwzhao at ebay dot com>
> > >> €Hemanth Dendukuri <hdendukuri at ebay dot com>
> > >> €Senthil Kumar <senthilkumar at ebay dot com>
> > >>
> > >>
> > >> Affiliations
> > >> The initial committers are employees of eBay Inc.
> > >>
> > >> Sponsors
> > >>
> > >> Champion
> > >> €Henry Saputra <hsaputra at apache dot org> - Apache IPMC member
> > >>
> > >> Nominated Mentors
> > >> €Owen O¹Malley < omalley at apache dot org > - Apache IPMC member,
> > >>Hortonworks
> > >> €Henry Saputra <hsaputra at apache dot org> - Apache IPMC member
> > >> €Julian Hyde <jhyde at hortonworks dot com> - Apache IPMC member,
> > >>Hortonworks
> > >> €Amareshwari Sriramdasu <amareshwari at apache dot org> - Apache
IPMC
> > >>member
> > >> €Taylor Goetz <ptgoetz at apache dot org> - Apache IPMC member,
> > >>Hortonworks
> > >>
> > >> Sponsoring Entity
> > >> We are requesting the Incubator to sponsor this project.
> > >>
> > >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message