apex-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From t..@apache.org
Subject incubator-apex-core git commit: Removing duplicate Apex Malhar page from core docs
Date Wed, 02 Mar 2016 22:28:48 GMT
Repository: incubator-apex-core
Updated Branches:
  refs/heads/APEXCORE-293 254beb4f7 -> 5dd0d8406


Removing duplicate Apex Malhar page from core docs


Project: http://git-wip-us.apache.org/repos/asf/incubator-apex-core/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-apex-core/commit/5dd0d840
Tree: http://git-wip-us.apache.org/repos/asf/incubator-apex-core/tree/5dd0d840
Diff: http://git-wip-us.apache.org/repos/asf/incubator-apex-core/diff/5dd0d840

Branch: refs/heads/APEXCORE-293
Commit: 5dd0d84062e6e2a2390d04dcc5c8dc88ed66c491
Parents: 254beb4
Author: Sasha Parfenov <sashap@apache.org>
Authored: Tue Mar 1 19:44:45 2016 -0800
Committer: Sasha Parfenov <sashap@apache.org>
Committed: Tue Mar 1 19:44:45 2016 -0800

----------------------------------------------------------------------
 docs/apex_malhar.md | 59 ------------------------------------------------
 docs/index.md       |  3 +--
 mkdocs.yml          |  1 -
 3 files changed, 1 insertion(+), 62 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/5dd0d840/docs/apex_malhar.md
----------------------------------------------------------------------
diff --git a/docs/apex_malhar.md b/docs/apex_malhar.md
deleted file mode 100644
index 45dee76..0000000
--- a/docs/apex_malhar.md
+++ /dev/null
@@ -1,59 +0,0 @@
-Apache Apex Malhar
-================================================================================
-
-Apache Apex Malhar is an open source operator and codec library that can be used with the
[Apache Apex](http://apex.apache.org/) platform to build real-time streaming applications.
 Enabling users to extract value quickly, Malhar operators help get data in, analyze it in
real-time, and get data out of Hadoop.  In addition to the operators, the library contains
a number of demos applications, demonstrating operator features and capabilities.  To see
the full list of available operators and related documentation, visit [Apex Malhar on Github](https://github.com/apache/incubator-apex-malhar)
-
-![MalharDiagram](images/malhar-operators.png)
-
-# Capabilities common across Malhar operators
-
-For most streaming platforms, connectors are afterthoughts and often end up being simple
‘bolt-ons’ to the platform. As a result they often cause performance issues or data loss
when put through failure scenarios and scalability requirements. Malhar operators do not face
these issues as they were designed to be integral parts of Apex. Hence, they have following
core streaming runtime capabilities
-
-1.  **Fault tolerance** – Malhar operators where applicable have fault tolerance built
in. They use the checkpoint capability provided by the framework to ensure that there is no
data loss under ANY failure scenario.
-2.  **Processing guarantees** – Malhar operators where applicable provide out of the box
support for ALL three processing guarantees – exactly once, at-least once, and at-most once
WITHOUT requiring the user to write any additional code.  Some operators, like MQTT operator,
deal with source systems that can not track processed data and hence need the operators to
keep track of the data.  Malhar has support for a generic operator that uses alternate storage
like HDFS to facilitate this.  Finally for databases that support transactions or support
any sort of atomic batch operations Malhar operators can do exactly once down to the tuple
level.
-3.  **Dynamic updates** – Based on changing business conditions you often have to tweak
several parameters used by the operators in your streaming application without incurring any
application downtime. You can also change properties of a Malhar operator at runtime without
having to bring down the application.
-4.  **Ease of extensibility** – Malhar operators are based on templates that are easy to
extend.
-5.  **Partitioning support** – In streaming applications the input data stream often needs
to be partitioned based on the contents of the stream. Also for operators that ingest data
from external systems partitioning needs to be done based on the capabilities of the external
system.  For example with Kafka, the operator can automatically scale up or down based on
the changes in the number of Kafka partitions.
-
-# Operator Library Overview
-
-## Input/output connectors
-
-Below is a summary of the various sub categories of input and output operators. Input operators
also have a corresponding output operator
-
-*   **File Systems** – Most streaming analytics use cases require the data to be stored
in HDFS or perhaps S3 if the application is running in AWS.  Users often need to re-run their
streaming analytical applications against historical data or consume data from upstream processes
that are perhaps writing to some NFS share.  Apex supports input & output operators for
HDFS, S3, NFS & Local Files.  There are also File Splitter and Block Reader operators,
which can accelecate processing of large files by splitting and paralellizing the work across
non-overlapping sets of file blocks.
-*   **Relational Databases** – Most stream processing use cases require some reference
data lookups to enrich, tag or filter streaming data. There is also a need to save results
of the streaming analytical computation to a database so an operational dashboard can see
them. Apex supports a JDBC operator so you can read/write data from any JDBC compliant RDBMS
like Oracle, MySQL, Sqlite, etc.
-*   **NoSQL Databases** – NoSQL key-value pair databases like Cassandra & HBase are
a common part of streaming analytics application architectures to lookup reference data or
store results.  Malhar has operators for HBase, Cassandra, Accumulo, Aerospike, MongoDB, and
CouchDB.
-*   **Messaging Systems** – Kafka, JMS, and similar systems are the workhorses of messaging
infrastructure in most enterprises.  Malhar has a robust, industry-tested set of operators
to read and write Kafka, JMS, ZeroMQ, and RabbitMQ messages.
-*   **Notification Systems** – Malhar includes an operator for sending notifications via
SMTP.
-*   **In-memory Databases & Caching platforms** - Some streaming use cases need instantaneous
access to shared state across the application. Caching platforms and in-memory databases serve
this purpose really well. To support these use cases, Malhar has operators for memcached and
Redis.
-*   **Social Media** - Malhar includes an operator to connect to the popular Twitter stream
fire hose.
-*   **Protocols** - Malhar provides connectors that can communicate in HTTP, RSS, Socket,
WebSocket, FTP, and MQTT.
-
-## Parsers
-
-There are many industry vertical specific data formats that a streaming application developer
might need to parse. Often there are existing parsers available for these that can be directly
plugged into an Apache Apex application. For example in the Telco space, a Java based CDR
parser can be directly plugged into Apache Apex operator. To further simplify development
experience, Malhar also provides some operators for parsing common formats like XML (DOM &
SAX), JSON (flat map converter), Apache log files, syslog, etc.
-
-## Stream manipulation
-
-Streaming data inevitably needs processing to clean, filter, tag, summarize, etc. The goal
of Malhar is to enable the application developer to focus on WHAT needs to be done to the
stream to get it in the right format and not worry about the HOW.  Malhar has several operators
to perform the common stream manipulation actions like – GroupBy, Join, Distinct/Unique,
Limit, OrderBy, Split, Sample, Inner join, Outer join, Select, Update etc.
-
-## Compute
-
-One of the most important promises of a streaming analytics platform like Apache Apex is
the ability to do analytics in real-time. However delivering on the promise becomes really
difficult when the platform does not provide out of the box operators to support variety of
common compute functions as the user then has to worry about making these scalable, fault
tolerant, stateful, etc.  Malhar takes this responsibility away from the application developer
by providing a variety of out of the box computational operators.
-
-Below is just a snapshot of the compute operators available in Malhar
-
-*   Statistics and math - Various mathematical and statistical computations over application
defined time windows.
-*   Filtering and pattern matching
-*   Sorting, maps, frequency, TopN, BottomN
-*   Random data generators
-
-## Languages Support
-
-Migrating to a new platform often requires re-use of the existing code that would be difficult
or time-consuming to re-write.  With this in mind, Malhar supports invocation of code written
in other languages by wrapping them in one of the library operators, and allows execution
of software written in:
-
-* JavaScript
-* Python
-* R
-* Ruby
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/5dd0d840/docs/index.md
----------------------------------------------------------------------
diff --git a/docs/index.md b/docs/index.md
index 6a78abf..71d4e6b 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -12,8 +12,7 @@ Apex is a Hadoop YARN native big data processing platform, enabling real
time st
 
 Platform has been demonstated to scale linearly across Hadoop clusters under extreme loads
of billions of events per second.  Hardware and process failures are quickly recovered with
HDFS-backed checkpointing and automatic operator recovery, preserving application state and
resuming execution in seconds.  Functional and operational specifications are separated. 
Apex provides a simple API, which enables users to write generic, reusable code.  The code
is dropped in as-is and platform automatically handles the various operational concerns, such
as state management, fault tolerance, scalability, security, metrics, etc.  This frees users
to focus on functional development, and lets platform provide operability support.
 
-The core Apex platform is supplemented by Malhar, a library of connector and logic functions,
enabling rapid application development.  These operators and modules provide access to HDFS,
S3, NFS, FTP, and other file systems; Kafka, ActiveMQ, RabbitMQ, JMS, and other message systems;
MySql, Cassandra, MongoDB, Redis, HBase, CouchDB, generic JDBC, and other database connectors.
The Malhar library also includes a host of other common business logic patterns that help
users to significantly reduce the time it takes to go into production.  Ease of integration
with all other big data technologies is one of the primary missions of Malhar.
-
+The core Apex platform is supplemented by Malhar, a library of connector and logic functions,
enabling rapid application development.  These operators and modules provide access to HDFS,
S3, NFS, FTP, and other file systems; Kafka, ActiveMQ, RabbitMQ, JMS, and other message systems;
MySql, Cassandra, MongoDB, Redis, HBase, CouchDB, generic JDBC, and other database connectors.
 In addition to the operators, the library contains a number of demos applications, demonstrating
operator features and capabilities.  To see the full list of available operators and related
documentation, visit [Apex Malhar on Github](https://github.com/apache/incubator-apex-malhar)
 
 For additional information visit [Apache Apex (incubating)](http://apex.incubator.apache.org/).
 

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/5dd0d840/mkdocs.yml
----------------------------------------------------------------------
diff --git a/mkdocs.yml b/mkdocs.yml
index c6a26d7..c91718c 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -3,7 +3,6 @@ site_favicon: favicon.ico
 theme: readthedocs
 pages:
 - Apache Apex: index.md
-- Apache Apex-Malhar: apex_malhar.md
 - Development:
     - Development Setup: apex_development_setup.md
     - Applications: application_development.md


Mime
View raw message