incubator-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Incubator Wiki] Trivial Update of "ApexProposal" by AmolKekre
Date Tue, 04 Aug 2015 06:13:34 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Incubator Wiki" for change notification.

The "ApexProposal" page has been changed by AmolKekre:
https://wiki.apache.org/incubator/ApexProposal?action=diff&rev1=8&rev2=9

Comment:
Added alignment

  
  The DataTorrent team has always focused on building a robust end user community of paying
and non-paying customers. We think that the existing community centered around the existing
google groups mailing list should be relatively easy to transform into an Apache-style community
including both users and developers. 
  
+ 
+ == Community ==
+ If Apex is accepted for incubation, the primary initial goal will be transitioning the core
community towards embracing the Apache Way of project governance. We will solicit major existing
contributors to become committers on the project from the start. It should be noted that the
existing community is already more diverse in many ways than some top-level Apache projects.
We expect that we can encourage even more diversity.
+ 
+ == Core Developers ==
+ While a few core developers are skilled in working in openly governed Apache communities,
most of the core developers are currently NOT affiliated with the ASF and would require new
ICLAs before committing to the project. There would also be a learning curve associated with
this on-boarding. Changing current development practices to be more open will be an important
step.
+ 
+ == Alignment ==
+ The following existing ASF projects provide related functionality as that provided by Apex
and should be considered when reviewing Apex proposal:
+ 
+ Apache HadoopⓇ is a distributed storage and processing framework for very large datasets
focusing primarily on batch processing for analytic purposes. Apex is a native YARN application.
The Apex and Malhar roadmap includes plans to continue to leverage YARN, and help the YARN
community develop the ability to support long running applications. Apex uses DFS interface
of its core checkpoint/commit. Malhar has a large number of operators that leverage HDFS and
other Apache projects. Our roadmap includes plans to continue to deepen the currently close
integration with HDFS.
+ 
+ Apache HBase offers tabular data stored in Hadoop based on the Google Bigtable model. Malhar
has HBase connectors to ease integration with HBase. Malhar roadmap includes plans to continue
to enhance integration with Apache HBase.
+ 
+ Apache Kafka offers distributed and durable publish-subscribe messaging. Malhar integrates
Kafka with Hadoop through feature rich connectors and supports ingest as well as analytical
functions to incoming data. Raw data can be ingested from Kafka and results can be written
to Kafka. Malhar roadmap includes plans to continue to enhance integration with Apache Kafka.
+ 
+ Apache Flume is a distributed, reliable, and available service for efficiently collecting,
aggregating, and moving large amounts of log data. Malhar has Flume connectors to ease integration
with Flume. These connectors ensures that ingestion with Flume is fault tolerant and thus
can be done in real-time with the same SLA as Flume’s HDFS connectors. Malhar roadmap includes
plans to continue to enhance integration with Apache Flume.
+ 
+ Apache Cassandra is a highly scalable, distributed key-value store that focuses on eventual
consistency. Malhar has connectors to ease integration with Cassandra. Malhar roadmap includes
plans to continue to enhance integration with Apache Cassandra.
+ 
+ Apache Accumulo is a distributed key-value store based on Google’s BigTable design. Malhar
has connectors to ease integration with Accumulo. The Malhar roadmap includes plans to continue
to enhance integration with Apache Accumulo.
+ 
+ Apache Tez is aimed at building an application framework which allows for a complex DAG
of tasks for process data. The Apex and Malhar roadmaps include plans to integrate with Apache
Tez but this is not currently supported.
+ 
+ Apache ActiveMQ and its sub project Apache Apollo offers a powerful message queue framework.
Malhar has ActiveMQ connectors that ease integration with ActiveMQ.
+ 
+ Apache Spark is an engine for processing large datasets, typically in a Hadoop cluster.
Malhar project makes it easy for users to integrate with Spark. The Malhar roadmap includes
plans to continue to enhance integration with Apache Spark.
+ 
+ Apache Flink is an engine for scalable batch and stream data processing. Malhar project
makes it easy for users to integrate with Flink. There is overlap in how Flink leverages data-in-motion
architecture for both stream and batch processing, and it does subscribe to our thought process
that data-in-motion can handle both stream and batch, meanwhile a batch only engine will find
it harder to manage streams. We differ in terms of how we handle operability, user defined
code, metrics, webservices etc. Apex is very operational oriented, while Flink has much more
focus on functional elements. Malhar and rapid availability of common business logic is another
differentiator. We believe both these approaches are valid and the community and innovation
will gain by through cross pollination. We plan to integrate with Apache Flink via HDFS for
now.
+ 
+ Apache Hive software facilitates querying and managing large datasets residing in distributed
storage. Malhar project makes it easy for users to integrate with Apache Hive. The Malhar
roadmap includes plans to continue to enhance integration with Apache Hive.
+ 
+ Apache Pig is a platform for analyzing large data sets.  Pig consists of a high-level language
for expressing data analysis programs, coupled with infrastructure for evaluating these programs.
The Apex and Malhar roadmaps include plans to integrate with Apache Pig.
+ 
+ Apache Storm is a distributed realtime computation system. Malhar makes it easy for users
to integrate with Apache Storm. We plan to integrate with Apache Storm via HDFS for now. Malhar
roadmaps include plans to continue to support mechanism for integration with Apache Storm.
+ 
+ Apache Samza is a distributed stream processing framework. Malhar makes it easy for users
to integrate with Apache Samza. We plan to integrate with Apache Samza via HDFS or Apache
Kafka for now. Malhar roadmaps include plans to continue to support mechanism for integration
with Apache Samza.
+ 
+ Apache Slider is a YARN application to deploy existing distributed applications on YARN,
monitor them, and make them larger or smaller as desired even when the application is running.
Once Slider matures, we will take a look at close integration of Apex with Slider.
+ 
+ Project Malhar and Apex are aligned to many more Apache projects and other open source projects
as ease of integration with other technologies is one of the primary goals of this project.
These include Apache Solr, ElasticSearch, MongoDB, Aerospike, ZeroMQ, CouchDB, CouchBase,
MemCache, Redis, RabbitMQ, Apache Derby.
+ 

---------------------------------------------------------------------
To unsubscribe, e-mail: cvs-unsubscribe@incubator.apache.org
For additional commands, e-mail: cvs-help@incubator.apache.org


Mime
View raw message