Return-Path: X-Original-To: apmail-streams-commits-archive@minotaur.apache.org Delivered-To: apmail-streams-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E3EDE1903E for ; Mon, 25 Apr 2016 16:07:06 +0000 (UTC) Received: (qmail 12059 invoked by uid 500); 25 Apr 2016 16:07:06 -0000 Delivered-To: apmail-streams-commits-archive@streams.apache.org Received: (qmail 12014 invoked by uid 500); 25 Apr 2016 16:07:06 -0000 Mailing-List: contact commits-help@streams.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@streams.incubator.apache.org Delivered-To: mailing list commits@streams.incubator.apache.org Received: (qmail 11998 invoked by uid 99); 25 Apr 2016 16:07:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Apr 2016 16:07:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 646A3180234 for ; Mon, 25 Apr 2016 16:07:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -3.221 X-Spam-Level: X-Spam-Status: No, score=-3.221 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001] autolearn=disabled Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id Y2OOxuf0X9zl for ; Mon, 25 Apr 2016 16:07:01 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with SMTP id 3A6E060DEA for ; Mon, 25 Apr 2016 16:07:00 +0000 (UTC) Received: (qmail 11696 invoked by uid 99); 25 Apr 2016 16:06:59 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Apr 2016 16:06:59 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 0D515E0BAC; Mon, 25 Apr 2016 16:06:59 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: sblackmon@apache.org To: commits@streams.incubator.apache.org Date: Mon, 25 Apr 2016 16:07:07 -0000 Message-Id: <2edb0ef3c5d94cc59b9c84cfc4c82485@git.apache.org> In-Reply-To: <10e5a34ae44a4a67b2b663b0b52314b0@git.apache.org> References: <10e5a34ae44a4a67b2b663b0b52314b0@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [10/11] incubator-streams-master git commit: add architecture and example diagrams add architecture and example diagrams rework architecture content Project: http://git-wip-us.apache.org/repos/asf/incubator-streams-master/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-streams-master/commit/749300a8 Tree: http://git-wip-us.apache.org/repos/asf/incubator-streams-master/tree/749300a8 Diff: http://git-wip-us.apache.org/repos/asf/incubator-streams-master/diff/749300a8 Branch: refs/heads/newwebpage Commit: 749300a88c5704578ceeb75d06bcb16362c0a467 Parents: 37f6768 Author: Steve Blackmon @steveblackmon Authored: Thu Apr 21 15:44:15 2016 -0500 Committer: Steve Blackmon @steveblackmon Committed: Thu Apr 21 15:44:15 2016 -0500 ---------------------------------------------------------------------- pom.xml | 47 +++++++----- src/site/markdown/architecture.md | 65 ++++------------ src/site/markdown/concepts.md | 56 ++++++++++++++ src/site/markdown/downloads.md | 6 +- src/site/resources/architecture.dot | 40 ++++++++++ src/site/resources/example.dot | 124 +++++++++++++++++++++++++++++++ src/site/site.xml | 19 ++++- src/site/site_en.xml | 9 ++- 8 files changed, 284 insertions(+), 82 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-streams-master/blob/749300a8/pom.xml ---------------------------------------------------------------------- diff --git a/pom.xml b/pom.xml index af2ff94..509161d 100644 --- a/pom.xml +++ b/pom.xml @@ -36,7 +36,6 @@ 2012 - http://streams.incubator.apache.org/site/${project.version}/${project.artifactId} @@ -297,7 +296,7 @@ 2.2 0.22 1.0.3 - 0.4.6 + 0.4.10 0.8.3 0.8.4 1.2.6 @@ -386,6 +385,9 @@ src/main/resources + + src/site/diagrams + @@ -399,9 +401,6 @@ true true target/generated-sources/jsonschema2pojo - - src/main/jsonschema - true true @@ -658,23 +657,9 @@ - - org.apache.maven.plugins - maven-javadoc-plugin - ${javadoc.plugin.version} - - -Xdoclint:none - true - false - 128m - 1g - - - - @@ -689,7 +674,8 @@ - + false + license @@ -718,6 +704,27 @@ com.github.ferstl depgraph-maven-plugin + + org.apache.maven.plugins + maven-javadoc-plugin + ${javadoc.plugin.version} + + -Xdoclint:none + false + false + 128m + 1g + + + + + + javadoc-no-fork + test-javadoc-no-fork + + + + http://git-wip-us.apache.org/repos/asf/incubator-streams-master/blob/749300a8/src/site/markdown/architecture.md ---------------------------------------------------------------------- diff --git a/src/site/markdown/architecture.md b/src/site/markdown/architecture.md index 174fabb..9f457a4 100644 --- a/src/site/markdown/architecture.md +++ b/src/site/markdown/architecture.md @@ -1,12 +1,4 @@ -##Architecture - -Streams contains libraries and patterns for specifying document schemas and converting documents to and from ActivityStreams format, and runtime bindings for deploying, monitoring, and interfacing with running streams. - -In general streams can be characterized as perpetual (capable of running indefinitely) or non-perpetual (expected to run until all providers run out of data). - -###Basic Concepts - -####Module +## Architecture Apache Streams consists of a loosely coupled set of modules with specific capabilities, such as: @@ -16,58 +8,27 @@ Apache Streams consists of a loosely coupled set of modules with specific capabi - binding streams components to other systems - facilitating starting and stopping of streams. -Each module has it's own POM and dependency tree. Each stream deployment needs to import only the modules it needs for what it wants to do. - -####Component - -Components are the classes that do stuff within a stream. Components are assembled into pipelines and executed using a runtime. There are several core types of Components, each using a specific java interface: - -#####Provider - -A Provider is a component that *provides* data to the stream from external systems. - -#####Processor - -A Processor is a component that *processes* data flowing through the stream - transformations, filters, and enrichments are common processors. - -#####PersistWriter +![Architecture](architecture.dot.svg) -A PersistWriter is a component that *writes* data exiting the stream. +#### Modules -#####PersistReader - -A PersistReader is a component that *reads* data, often previously written by a PersistWriter. - -####Schema - -A Schema defines the expected shape of the documents that will passed from step to step within a stream. Defining the schema for a type of document allows source files and resource files to be generated by the build process, relieving your team of the need to maintain these files by hand. - -Schemas can include other schemas, whether in the same repo or available via HTTP, allowing for full or partial reuse within or across organizations. - -####Datum - -A Datum is a single piece of data within a stream. A datum typically has an identifier, a timestamp, a document (which may be any java object), and additional metadata kept apart from the document related to upstream or downstream processing.. +Each module has it's own POM and dependency tree. Each stream deployment needs to import only the modules it needs for what it wants to do. -####Activity +#### Schemas -Apache Streams has a preference for ActivityStreams formatted messages. These messages may be passed using the 'Activity' class or one of it's sub-classes. +Streams also contains libraries and patterns for specifying document schemas, converting documents to and from ActivityStreams format, and generating source and resource files for binding to data objects in those formats. -####ActivityObject +#### Pipelines -An activity has several sub-object fields: +A Pipeline is a set of collection, processing, and storage components structured in a directed graph (cycles may be permitted) which is packaged, deployed, started, and stopped together. - - actor (required) - - object (optional) - - target (optional) - - generator (optional) - - provider (optional) +#### Runtimes -Streams containing details of actors, objects, etc... may be created using the 'ActivityObject' class or one of it's sub-classes. +A Runtime is a module containing bindings that help setup and run a pipeline. Runtimes may submit pipeline binaries to an existing cluster, or may launch the process(es) to execute the stream directly. -####Pipeline +### Example -A Pipeline is a set of collection, processing, and storage components structured in a directed graph (cycles may be permitted) which is packaged, deployed, started, and stopped together. +A standard usage of Apache Streams is to collect, normalize, and archive activity across multiple networks. -####Runtime +![Example](example.dot.svg) -A Runtime is a module containing bindings that help setup and run a pipeline. Runtimes may submit pipeline binaries to an existing cluster, or may launch the process(es) to execute the stream directly. http://git-wip-us.apache.org/repos/asf/incubator-streams-master/blob/749300a8/src/site/markdown/concepts.md ---------------------------------------------------------------------- diff --git a/src/site/markdown/concepts.md b/src/site/markdown/concepts.md new file mode 100644 index 0000000..9a580e6 --- /dev/null +++ b/src/site/markdown/concepts.md @@ -0,0 +1,56 @@ +### Concepts + +####Component + +Components are the classes that do stuff within a stream. Components are assembled into pipelines and executed using a runtime. There are several core types of Components, each using a specific java interface: + +#####Provider + +A Provider is a component that *provides* data to the stream from external systems. + +#####Processor + +A Processor is a component that *processes* data flowing through the stream - transformations, filters, and enrichments are common processors. + +#####PersistWriter + +A PersistWriter is a component that *writes* data exiting the stream. + +#####PersistReader + +A PersistReader is a component that *reads* data, often previously written by a PersistWriter. + +####Schema + +A Schema defines the expected shape of the documents that will passed from step to step within a stream. Defining the schema for a type of document allows source files and resource files to be generated by the build process, relieving your team of the need to maintain these files by hand. + +Schemas can include other schemas, whether in the same repo or available via HTTP, allowing for full or partial reuse within or across organizations. + +####Datum + +A Datum is a single piece of data within a stream. A datum typically has an identifier, a timestamp, a document (which may be any java object), and additional metadata kept apart from the document related to upstream or downstream processing.. + +####Activity + +Apache Streams has a preference for ActivityStreams formatted messages. These messages may be passed using the 'Activity' class or one of it's sub-classes. + +####ActivityObject + +An activity has several sub-object fields: + + - actor (required) + - object (optional) + - target (optional) + - generator (optional) + - provider (optional) + +Streams containing details of actors, objects, etc... may be created using the 'ActivityObject' class or one of it's sub-classes. + +####Pipeline + +A Pipeline is a set of collection, processing, and storage components structured in a directed graph (cycles may be permitted) which is packaged, deployed, started, and stopped together. + +####Runtime + +A Runtime is a module containing bindings that help setup and run a pipeline. Runtimes may submit pipeline binaries to an existing cluster, or may launch the process(es) to execute the stream directly. + http://git-wip-us.apache.org/repos/asf/incubator-streams-master/blob/749300a8/src/site/markdown/downloads.md ---------------------------------------------------------------------- diff --git a/src/site/markdown/downloads.md b/src/site/markdown/downloads.md index 9d055d4..f1fae55 100644 --- a/src/site/markdown/downloads.md +++ b/src/site/markdown/downloads.md @@ -2,8 +2,6 @@ All downloads can be verified using Apache Streams code signing. ## Current Downloads -### Streams Project - -| Version | Source | asc | md5 | sha1 | +| Artifact | Version | Source | asc | md5 | sha1 | |---------|--------| -| 0.2-incubating | [zip](https://dist.apache.org/repos/dist/release/incubator/streams/releases/streams-project/streams-project/streams-project-0.2-incubating-source-release.zip) | [asc](https://dist.apache.org/repos/dist/release/incubator/streams/releases/0.2-incubating/streams-project/streams-project-0.2-incubating-source-release.zip.asc) | [md5](https://dist.apache.org/repos/dist/release/incubator/streams/releases/0.2-incubating/streams-project/streams-project-0.2-incubating-source-release.zip.md5) | [sha1](https://dist.apache.org/repos/dist/release/incubator/streams/releases/0.2-incubating/streams-project/streams-project-0.2-incubating-source-release.zip.sha1) | +| streams-project | 0.2-incubating | [zip](https://dist.apache.org/repos/dist/release/incubator/streams/releases/streams-project/streams-project/streams-project-0.2-incubating-source-release.zip) | [asc](https://dist.apache.org/repos/dist/release/incubator/streams/releases/0.2-incubating/streams-project/streams-project-0.2-incubating-source-release.zip.asc) | [md5](https://dist.apache.org/repos/dist/release/incubator/streams/releases/0.2-incubating/streams-project/streams-project-0.2-incubating-source-release.zip.md5) | [sha1](https://dist.apache.org/repos/dist/release/incubator/streams/releases/0.2-incubating/streams-project/streams-project-0.2-incubating-source-release.zip.sha1) | http://git-wip-us.apache.org/repos/asf/incubator-streams-master/blob/749300a8/src/site/resources/architecture.dot ---------------------------------------------------------------------- diff --git a/src/site/resources/architecture.dot b/src/site/resources/architecture.dot new file mode 100644 index 0000000..f1aa1d5 --- /dev/null +++ b/src/site/resources/architecture.dot @@ -0,0 +1,40 @@ +digraph g { + + graph [compound = true]; + + //presentation + splines = true; + overlap = false; + rankdir = TB; + + subgraph cluster_upstream { + label="Upstream Systems"; + upstream_databases [label="Databases"] + upstream_generators [label="Generators"] + upstream_queues [label="Queues"] + } + + subgraph cluster_streams { + label="Apache Streams Pipelines"; + providers [label="Providers"] + persistReaders [label="PersistReaders"] + processors [label="Processors"] + persistWriters [label="PersistWriters"] + } + + subgraph cluster_downstream { + label="Downstream Systems"; + downstream_databases [label="Databases"] + downstream_queues [label="Queues"] + } + + upstream_generators -> providers + upstream_queues -> persistReaders + upstream_databases -> persistReaders + providers,persistReaders -> processors + processors -> processors + processors -> persistWriters + persistWriters -> downstream_databases,downstream_queues + + +} http://git-wip-us.apache.org/repos/asf/incubator-streams-master/blob/749300a8/src/site/resources/example.dot ---------------------------------------------------------------------- diff --git a/src/site/resources/example.dot b/src/site/resources/example.dot new file mode 100644 index 0000000..5b97773 --- /dev/null +++ b/src/site/resources/example.dot @@ -0,0 +1,124 @@ +digraph g { + + graph [compound = true]; + + //presentation + splines = true; + overlap = false; + rankdir = TB; + + subgraph cluster_generators { + label="generators"; + generators_twitter [label="https://api.twitter.com"] + generators_facebook [label="https://api.facebook.com"] + generators_youtube [label="https://api.youtube.com"] + generators_instagram [label="https://api.instagram.com"] + } + + subgraph cluster_providers { + label="providers"; + subgraph cluster_providers_twitter { + label="twitter"; + providers_twitter_userstream [label="TwitterUserstreamProvider"] + providers_twitter_userinfo [label="TwitterUserInformationProvider"] + providers_twitter_timeline [label="TwitterTimelineProvider"] + providers_twitter_following [label="TwitterFollowingProvider"] + } + subgraph cluster_providers_facebook { + label="facebook"; + providers_facebook_page [label="FacebookPageProvider"] + providers_facebook_pagefeed [label="FacebookPageFeedDataCollector"] + } + subgraph cluster_providers_youtube { + label="youtube"; + providers_youtube_channel [label="YoutubeChannelProvider"] + providers_youtube_video [label="YoutubeUserActivityProvider"] + } + subgraph cluster_providers_instagram{ + label="instagram"; + providers_instagram_userinfo [label="InstagramUserInfoCollector"] + providers_instagram_media [label="InstagramRecentMediaCollector"] + } + } + + subgraph cluster_processors { + label="processors"; + processors_twitter_activity [label="ActivityConverterProcessor"] + processors_twitter_activityobject [label="ActivityObjectConverterProcessor"] + processors_facebook_activity [label="FacebookPageActivitySerializer"] + processors_facebook_activityobject [label="FacebookPostActivitySerializer"] + processors_youtube_activity [label="YoutubeTypeConverter"] + processors_youtube_activityobject [label="YoutubeTypeConverter"] + processors_instagram_activity [label="InstagramTypeConverter"] + processors_instagram_activityobject [label="InstagramTypeConverter"] + } + + subgraph cluster_persisters_1 { + label="persisters"; + persisters_kinesis_writer_activity [label="KinesisPersistWriter"] + persisters_kinesis_writer_activityobject [label="KinesisPersistWriter"] + } + + subgraph cluster_persisters_2 { + label="persisters"; + persisters_elasticsearch [label="ElasticsearchPersistWriter"] + persisters_graph [label="GraphPersistWriter"] + persisters_hdfs [label="WebHdfsPersistWriter"] + persisters_kinesis_reader_activity [label="KinesisPersistReader"] + persisters_kinesis_reader_activityobject [label="KinesisPersistReader"] + } + + subgraph cluster_dbs { + label="dbs"; + elasticsearch [label="elasticsearch"] + hdfs [label="hdfs"] + neo4j [label="neo4j"] + } + + generators_twitter -> providers_twitter_userstream + generators_twitter -> providers_twitter_timeline + generators_twitter -> providers_twitter_following + generators_twitter -> providers_twitter_userinfo + providers_twitter_userinfo -> processors_twitter_activityobject [label="o.a.s.t.User"] + providers_twitter_userstream -> processors_twitter_activity [label="o.a.s.t.Tweet"] + providers_twitter_timeline -> processors_twitter_activity [label="o.a.s.t.Tweet"] + providers_twitter_following -> processors_twitter_activity [label="o.a.s.t.Follow"] + + generators_facebook -> providers_facebook_page + generators_facebook -> providers_facebook_pagefeed + providers_facebook_page -> processors_facebook_activityobject [label="o.a.s.f.Page"] + providers_facebook_pagefeed -> processors_facebook_activity [label="o.a.s.f.Post\no.a.s.f.Comment"] + + generators_youtube -> providers_youtube_channel + generators_youtube -> providers_youtube_video + providers_youtube_channel -> processors_youtube_activityobject [label="o.a.s.y.Channel"] + providers_youtube_video -> processors_youtube_activity [label="o.a.s.y.Video"] + + generators_instagram -> providers_instagram_userinfo + generators_instagram -> providers_instagram_media + providers_instagram_userinfo -> processors_instagram_activityobject [label="o.a.s.i.UserInfoData"] + providers_instagram_media -> processors_instagram_activity [label="o.a.s.i.MediaFeedData"] + + processors_twitter_activityobject -> persisters_kinesis_writer_activityobject [label="o.a.s.p.j.Page"] + processors_twitter_activity -> persisters_kinesis_writer_activity [label="o.a.s.p.j.Post\no.a.s.p.j.Share\no.a.s.p.j.Follow"] + processors_facebook_activityobject -> persisters_kinesis_writer_activityobject [label="o.a.s.p.j.Page"] + processors_facebook_activity -> persisters_kinesis_writer_activity [label="o.a.s.p.j.Post\no.a.s.p.j.Comment"] + processors_youtube_activityobject -> persisters_kinesis_writer_activityobject [label="o.a.s.p.j.Page"] + processors_youtube_activity -> persisters_kinesis_writer_activity [label="o.a.s.p.j.Video\no.a.s.p.j.Comment"] + processors_instagram_activityobject -> persisters_kinesis_writer_activityobject [label="o.a.s.p.j.Page"] + processors_instagram_activity -> persisters_kinesis_writer_activity [label="o.a.s.p.j.Photo\no.a.s.p.j.Video\no.a.s.p.j.Comment"] + + persisters_kinesis_writer_activity -> kinesis -> persisters_kinesis_reader_activity [label="o.a.s.p.j.Activity"] + persisters_kinesis_writer_activityobject -> kinesis -> persisters_kinesis_reader_activityobject [label="o.a.s.p.j.ActivityObject"] + + persisters_kinesis_reader_activity -> persisters_elasticsearch + persisters_kinesis_reader_activity -> persisters_hdfs + persisters_kinesis_reader_activity -> persisters_graph + persisters_kinesis_reader_activityobject -> persisters_elasticsearch + persisters_kinesis_reader_activityobject -> persisters_hdfs + + persisters_elasticsearch -> elasticsearch + persisters_hdfs -> hdfs + persisters_graph -> neo4j + +} http://git-wip-us.apache.org/repos/asf/incubator-streams-master/blob/749300a8/src/site/site.xml ---------------------------------------------------------------------- diff --git a/src/site/site.xml b/src/site/site.xml index dfcaf2a..e06e9f2 100644 --- a/src/site/site.xml +++ b/src/site/site.xml @@ -17,6 +17,7 @@ ~ under the License. --> + org.apache.maven.skins @@ -26,12 +27,13 @@ true + navbar-inverse false Apache Streams - ./images/streams_logo.jpg + http://streams.incubator.apache.org/images/streams_logo.jpg http://streams.incubator.apache.org 100 150 @@ -50,6 +52,7 @@ + @@ -59,11 +62,17 @@ + + + + + + - - + + @@ -83,5 +92,9 @@ + + + + http://git-wip-us.apache.org/repos/asf/incubator-streams-master/blob/749300a8/src/site/site_en.xml ---------------------------------------------------------------------- diff --git a/src/site/site_en.xml b/src/site/site_en.xml index 99d3a23..e06e9f2 100644 --- a/src/site/site_en.xml +++ b/src/site/site_en.xml @@ -17,6 +17,7 @@ ~ under the License. --> + org.apache.maven.skins @@ -26,12 +27,13 @@ true + navbar-inverse false Apache Streams - ./images/streams_logo.jpg + http://streams.incubator.apache.org/images/streams_logo.jpg http://streams.incubator.apache.org 100 150 @@ -50,6 +52,7 @@ + @@ -68,8 +71,8 @@ - - + +