apex-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From t..@apache.org
Subject [2/8] incubator-apex-core git commit: Migrating docs
Date Wed, 02 Mar 2016 01:40:28 GMT
http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/application_packages.md
----------------------------------------------------------------------
diff --git a/docs/application_packages.md b/docs/application_packages.md
new file mode 100644
index 0000000..521779a
--- /dev/null
+++ b/docs/application_packages.md
@@ -0,0 +1,669 @@
+Apache Apex Application Packages
+================================
+
+An Apache Apex Application Package is a zip file that contains all the
+necessary files to launch an application in Apache Apex. It is the
+standard way for assembling and sharing an Apache Apex application.
+
+# Requirements
+
+You will need have the following installed:
+
+1. Apache Maven 3.0 or later (for assembling the App Package)
+2. Apache Apex 3.0.0 or later (for launching the App Package in your cluster)
+
+# Creating Your First Apex App Package
+
+You can create an Apex Application Package using your Linux command
+line, or using your favorite IDE.
+
+## Using Command Line
+
+First, change to the directory where you put your projects, and create
+an Apex application project using Maven by running the following
+command.  Replace "com.example", "mydtapp" and "1.0-SNAPSHOT" with the
+appropriate values (make sure this is all on one line):
+
+    $ mvn archetype:generate \
+     -DarchetypeGroupId=org.apache.apex \
+     -DarchetypeArtifactId=apex-app-archetype -DarchetypeVersion=3.2.0-incubating \
+     -DgroupId=com.example -Dpackage=com.example.mydtapp -DartifactId=mydtapp \
+     -Dversion=1.0-SNAPSHOT
+
+This creates a Maven project named "mydtapp". Open it with your favorite
+IDE (e.g. NetBeans, Eclipse, IntelliJ IDEA). In the project, there is a
+sample DAG that generates a number of tuples with a random number and
+prints out "hello world" and the random number in the tuples.  The code
+that builds the DAG is in
+src/main/java/com/example/mydtapp/Application.java, and the code that
+runs the unit test for the DAG is in
+src/test/java/com/example/mydtapp/ApplicationTest.java. Try it out by
+running the following command:
+
+    $cd mydtapp; mvn package
+
+This builds the App Package runs the unit test of the DAG.  You should
+be getting test output similar to this:
+
+```
+ -------------------------------------------------------
+  TESTS
+ -------------------------------------------------------
+
+ Running com.example.mydtapp.ApplicationTest
+ hello world: 0.8015370953286478
+ hello world: 0.9785359225545481
+ hello world: 0.6322611586644047
+ hello world: 0.8460953663451775
+ hello world: 0.5719372906929072
+ hello world: 0.6361174312337172
+ hello world: 0.14873007534816318
+ hello world: 0.8866986277418261
+ hello world: 0.6346526809866057
+ hello world: 0.48587295703904465
+ hello world: 0.6436832429676687
+
+ ...
+
+ Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.863
+ sec
+
+ Results :
+
+ Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
+```
+
+The "mvn package" command creates the App Package file in target
+directory as target/mydtapp-1.0-SNAPSHOT.apa. You will be able to use
+that App Package file to launch this sample application in your actual
+Apex installation.
+
+## Using IDE
+
+Alternatively, you can do the above steps all within your IDE.  For
+example, in NetBeans, select File -\> New Project.  Then choose “Maven”
+and “Project from Archetype” in the dialog box, as shown.
+
+![](images/AppPackage/ApplicationPackages.html-image00.png)
+
+Then fill the Group ID, Artifact ID, Version and Repository entries as shown below.
+
+![](images/AppPackage/ApplicationPackages.html-image02.png)
+
+Group ID: org.apache.apex
+Artifact ID: apex-app-archetype
+Version: 3.2.0-incubating (or any later version)
+
+Press Next and fill out the rest of the required information. For
+example:
+
+![](images/AppPackage/ApplicationPackages.html-image01.png)
+
+Click Finish, and now you have created your own Apache Apex App Package
+project, with a default unit test.  You can run the unit test, make code
+changes or make dependency changes within your IDE.  The procedure for
+other IDEs, like Eclipse or IntelliJ, is similar.
+
+# Writing Your Own App Package
+
+
+Please refer to the [Creating Apps](create.md) on the basics on how to write an Apache Apex application.  In your AppPackage project, you can add custom operators (refer to [Operator Development Guide](https://www.datatorrent.com/docs/guides/OperatorDeveloperGuide.html)), project dependencies, default and required configuration properties, pre-set configurations and other metadata.
+
+## Adding (and removing) project dependencies
+
+Under the project, you can add project dependencies in pom.xml, or do it
+through your IDE.  Here’s the section that describes the dependencies in
+the default pom.xml:
+```
+  <dependencies>
+    <!-- add your dependencies here -->
+    <dependency>
+      <groupId>org.apache.apex</groupId>
+      <artifactId>malhar-library</artifactId>
+      <version>${apex.version}</version>
+      <!--
+           If you know your application do not need the transitive dependencies that are pulled in by malhar-library,
+           Uncomment the following to reduce the size of your app package.
+      -->
+      <!--
+      <exclusions>
+        <exclusion>
+          <groupId>*</groupId>
+          <artifactId>*</artifactId>
+        </exclusion>
+      </exclusions>
+      -->
+    </dependency>
+    <dependency>
+      <groupId>org.apache.apex</groupId>
+      <artifactId>apex-engine</artifactId>
+      <version>${apex.version}</version>
+      <scope>provided</scope>
+    </dependency>
+    <dependency>
+      <groupId>junit</groupId>
+      <artifactId>junit</artifactId>
+      <version>4.10</version>
+      <scope>test</scope>
+    </dependency>
+  </dependencies>
+```
+
+By default, as shown above, the default dependencies include
+malhar-library in compile scope, dt-engine in provided scope, and junit
+in test scope.  Do not remove these three dependencies since they are
+necessary for any Apex application.  You can, however, exclude
+transitive dependencies from malhar-library to reduce the size of your
+App Package, provided that none of the operators in malhar-library that
+need the transitive dependencies will be used in your application.
+
+In the sample application, it is safe to remove the transitive
+dependencies from malhar-library, by uncommenting the "exclusions"
+section.  It will reduce the size of the sample App Package from 8MB to
+700KB.
+
+Note that if we exclude \*, in some versions of Maven, you may get
+warnings similar to the following:
+
+```
+
+ [WARNING] 'dependencies.dependency.exclusions.exclusion.groupId' for
+ org.apache.apex:malhar-library:jar with value '*' does not match a
+ valid id pattern.
+
+ [WARNING]
+ [WARNING] It is highly recommended to fix these problems because they
+ threaten the stability of your build.
+ [WARNING]
+ [WARNING] For this reason, future Maven versions might no longer support
+ building such malformed projects.
+ [WARNING]
+
+```
+This is a bug in early versions of Maven 3.  The dependency exclusion is
+still valid and it is safe to ignore these warnings.
+
+## Application Configuration
+
+A configuration file can be used to configure an application.  Different
+kinds of configuration parameters can be specified. They are application
+attributes, operator attributes and properties, port attributes, stream
+properties and application specific properties. They are all specified
+as name value pairs, in XML format, like the following.
+
+```
+<?xml version="1.0"?>
+<configuration>
+  <property>
+    <name>some_name_1</name>
+    <value>some_default_value</value>
+  </property>
+  <property>
+    <name>some_name_2</name>
+    <value>some_default_value</value>
+  </property>
+</configuration>
+```
+
+## Application attributes
+
+Application attributes are used to specify the platform behavior for the
+application. They can be specified using the parameter
+```dt.attr.<attribute>```. The prefix “dt” is a constant, “attr” is a
+constant denoting an attribute is being specified and ```<attribute>```
+specifies the name of the attribute. Below is an example snippet setting
+the streaming windows size of the application to be 1000 milliseconds.
+
+```
+  <property>
+     <name>dt.attr.STREAMING_WINDOW_SIZE_MILLIS</name>
+     <value>1000</value>
+  </property>
+```
+
+The name tag specifies the attribute and value tag specifies the
+attribute value. The name of the attribute is a JAVA constant name
+identifying the attribute. The constants are defined in
+com.datatorrent.api.Context.DAGContext and the different attributes can
+be specified in the format described above.
+
+## Operator attributes
+
+Operator attributes are used to specify the platform behavior for the
+operator. They can be specified using the parameter
+```dt.operator.<operator-name>.attr.<attribute>```. The prefix “dt” is a
+constant, “operator” is a constant denoting that an operator is being
+specified, ```<operator-name>``` denotes the name of the operator, “attr” is
+the constant denoting that an attribute is being specified and
+```<attribute>``` is the name of the attribute. The operator name is the
+same name that is specified when the operator is added to the DAG using
+the addOperator method. An example illustrating the specification is
+shown below. It specifies the number of streaming windows for one
+application window of an operator named “input” to be 10
+
+```
+<property>
+  <name>dt.operator.input.attr.APPLICATION_WINDOW_COUNT</name>
+  <value>10</value>
+</property>
+```
+
+The name tag specifies the attribute and value tag specifies the
+attribute value. The name of the attribute is a JAVA constant name
+identifying the attribute. The constants are defined in
+com.datatorrent.api.Context.OperatorContext and the different attributes
+can be specified in the format described above.
+
+## Operator properties
+
+Operators can be configured using operator specific properties. The
+properties can be specified using the parameter
+```dt.operator.<operator-name>.prop.<property-name>```. The difference
+between this and the operator attribute specification described above is
+that the keyword “prop” is used to denote that it is a property and
+```<property-name>``` specifies the property name.  An example illustrating
+this is specified below. It specifies the property “hostname” of the
+redis server for a “redis” output operator.
+
+```
+  <property>
+    <name>dt.operator.redis.prop.host</name>
+    <value>127.0.0.1</value>
+  </property>
+```
+
+The name tag specifies the property and the value specifies the property
+value. The property name is converted to a setter method which is called
+on the actual operator. The method name is composed by appending the
+word “set” and the property name with the first character of the name
+capitalized. In the above example the setter method would become
+setHost. The method is called using JAVA reflection and the property
+value is passed as an argument. In the above example the method setHost
+will be called on the “redis” operator with “127.0.0.1” as the argument.
+
+## Port attributes
+Port attributes are used to specify the platform behavior for input and
+output ports. They can be specified using the parameter ```dt.operator.<operator-name>.inputport.<port-name>.attr.<attribute>```
+for input port and ```dt.operator.<operator-name>.outputport.<port-name>.attr.<attribute>```
+for output port. The keyword “inputport” is used to denote an input port
+and “outputport” to denote an output port. The rest of the specification
+follows the conventions described in other specifications above. An
+example illustrating this is specified below. It specifies the queue
+capacity for an input port named “input” of an operator named “range” to
+be 4k.
+
+```
+<property>
+  <name>dt.operator.range.inputport.input.attr.QUEUE_CAPACITY</name>
+  <value>4000</value>
+</property>
+```
+
+The name tag specifies the attribute and value tag specifies the
+attribute value. The name of the attribute is a JAVA constant name
+identifying the attribute. The constants are defined in
+com.datatorrent.api.Context.PortContext and the different attributes can
+be specified in the format described above.
+
+The attributes for an output port can also be specified in a similar way
+as described above with a change that keyword “outputport” is used
+instead of “intputport”. A generic keyword “port” can be used to specify
+either an input or an output port. It is useful in the wildcard
+specification described below.
+
+## Stream properties
+
+Streams can be configured using stream properties. The properties can be
+specified using the parameter
+```dt.stream.<stream-name>.prop.<property-name>```  The constant “stream”
+specifies that it is a stream, ```<stream-name>``` specifies the name of the
+stream and ```<property-name>``` the name of the property. The name of the
+stream is the same name that is passed when the stream is added to the
+DAG using the addStream method. An example illustrating the
+specification is shown below. It sets the locality of the stream named
+“stream1” to container local indicating that the operators the stream is
+connecting be run in the same container.
+
+```
+  <property>
+    <name>dt.stream.stream1.prop.locality</name>
+    <value>CONTAINER_LOCAL</value>
+  </property>
+```
+
+The property name is converted into a set method on the stream in the
+same way as described in operator properties section above. In this case
+the method would be setLocality and it will be called in the stream
+“stream1” with the value as the argument.
+
+Along with the above system defined parameters, the applications can
+define their own specific parameters they can be specified in the
+configuration file. The only condition is that the names of these
+parameters don’t conflict with the system defined parameters or similar
+application parameters defined by other applications. To this end, it is
+recommended that the application parameters have the format
+```<full-application-class-name>.<param-name>.``` The
+full-application-class-name is the full JAVA class name of the
+application including the package path and param-name is the name of the
+parameter within the application. The application will still have to
+still read the parameter in using the configuration API of the
+configuration object that is passed in populateDAG.
+
+##  Wildcards
+
+Wildcards and regular expressions can be used in place of names to
+specify a group for applications, operators, ports or streams. For
+example, to specify an attribute for all ports of an operator it can be
+done as follows
+```
+<property>
+  <name>dt.operator.range.port.*.attr.QUEUE_CAPACITY</name>
+  <value>4000</value>
+</property>
+```
+
+The wildcard “\*” was used instead of the name of the port. Wildcard can
+also be used for operator name, stream name or application name. Regular
+expressions can also be used for names to specify attributes or
+properties for a specific set.
+
+## Adding configuration properties
+
+It is common for applications to require configuration parameters to
+run.  For example, the address and port of the database, the location of
+a file for ingestion, etc.  You can specify them in
+src/main/resources/META-INF/properties.xml under the App Package
+project. The properties.xml may look like:
+
+```
+<?xml version="1.0"?>
+<configuration>
+  <property>
+    <name>some_name_1</name>
+  </property>
+  <property>
+    <name>some_name_2</name>
+    <value>some_default_value</value>
+  </property>
+</configuration>
+```
+
+The name of an application-specific property takes the form of:
+
+```dt.operator.{opName}.prop.{propName} ```
+
+The first represents the property with name propName of operator opName.
+ Or you can set the application name at run time by setting this
+property:
+
+        dt.attr.APPLICATION_NAME
+
+There are also other properties that can be set.  For details on
+properties, refer to the [Operation and Installation Guide](https://www.datatorrent.com/docs/guides/OperationandInstallationGuide.html).
+
+In this example, property some_name_1 is a required property which
+must be set at launch time, or it must be set by a pre-set configuration
+(see next section).  Property some\_name\_2 is a property that is
+assigned with value some\_default\_value unless it is overridden at
+launch time.
+
+## Adding pre-set configurations
+
+
+At build time, you can add pre-set configurations to the App Package by
+adding configuration XML files under ```src/site/conf/<conf>.xml```in your
+project.  You can then specify which configuration to use at launch
+time.  The configuration XML is of the same format of the properties.xml
+file.
+
+## Application-specific properties file
+
+You can also specify properties.xml per application in the application
+package.  Just create a file with the name properties-{appName}.xml and
+it will be picked up when you launch the application with the specified
+name within the application package.  In short:
+
+  properties.xml: Properties that are global to the Configuration
+Package
+
+  properties-{appName}.xml: Properties that are specific when launching
+an application with the specified appName.
+
+## Properties source precedence
+
+If properties with the same key appear in multiple sources (e.g. from
+app package default configuration as META-INF/properties.xml, from app
+package configuration in the conf directory, from launch time defines,
+etc), the precedence of sources, from highest to lowest, is as follows:
+
+1. Launch time defines (using -D option in CLI, or the POST payload
+    with the Gateway REST API’s launch call)
+2. Launch time specified configuration file in file system (using -conf
+    option in CLI)
+3. Launch time specified package configuration (using -apconf option in
+    CLI or the conf={confname} with Gateway REST API’s launch call)
+4. Configuration from \$HOME/.dt/dt-site.xml
+5. Application defaults within the package as
+    META-INF/properties-{appname}.xml
+6. Package defaults as META-INF/properties.xml
+7. dt-site.xml in local DT installation
+8. dt-site.xml stored in HDFS
+
+## Other meta-data
+
+In a Apex App Package project, the pom.xml file contains a
+section that looks like:
+
+```
+<properties>
+  <apex.version>3.2.0-incubating</apex.version>
+  <apex.apppackage.classpath\>lib*.jar</apex.apppackage.classpath>
+</properties>
+```
+apex.version is the Apache Apex version that are to be used
+with this Application Package.
+
+apex.apppackage.classpath is the classpath that is used when
+launching the application in the Application Package.  The default is
+lib/\*.jar, where lib is where all the dependency jars are kept within
+the Application Package.  One reason to change this field is when your
+Application Package needs the classpath in a specific order.
+
+## Logging configuration
+
+Just like other Java projects, you can change the logging configuration
+by having your log4j.properties under src/main/resources.  For example,
+if you have the following in src/main/resources/log4j.properties:
+```
+ log4j.rootLogger=WARN,CONSOLE
+ log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
+ log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
+ log4j.appender.CONSOLE.layout.ConversionPattern=%d{ISO8601} [%t] %-5p
+ %c{2} %M - %m%n
+```
+
+The root logger’s level is set to WARN and the output is set to the console (stdout).
+
+Note that by default from project created from the maven archetype,
+there is already a log4j.properties file under src/test/resources and
+that file is only used for the unit test.
+
+# Zip Structure of Application Package
+
+
+Apache Apex Application Package files are zip files.  You can examine the content of any Application Package by using unzip -t on your Linux command line.
+
+There are four top level directories in an Application Package:
+
+1. "app" contains the jar files of the DAG code and any custom operators.
+2. "lib" contains all dependency jars
+3. "conf" contains all the pre-set configuration XML files.
+4. "META-INF" contains the MANIFEST.MF file and the properties.xml file.
+5. “resources” contains other files that are to be served by the Gateway on behalf of the app package.
+
+
+# Managing Application Packages Through DT Gateway
+
+The DT Gateway provides storing and retrieving Application Packages to
+and from your distributed file system, e.g. HDFS.
+
+## Storing an Application Package
+
+You can store your Application Packages through DT Gateway using this
+REST call:
+
+```
+ POST /ws/v2/appPackages
+```
+
+The payload is the raw content of your Application Package.  For
+example, you can issue this request using curl on your Linux command
+line like this, assuming your DT Gateway is accepting requests at
+localhost:9090:
+
+```
+$ curl -XPOST -T <app-package-file> http://localhost:9090/ws/v2/appPackages
+```
+
+## Getting Meta Information on Application Packages
+
+
+You can get the meta information on Application Packages stored through
+DT Gateway using this call.  The information includes the logical plan
+of each application within the Application Package.
+
+```
+ GET /ws/v2/appPackages/{owner}/{pkgName}/{pkgVersion}
+```
+
+## Getting Available Operators In Application Package
+
+You can get the list of available operators in the Application Package
+using this call.
+
+```
+GET /ws/v2/appPackages/{owner}/{pkgName}/{pkgVersion}/operators?parent={parent}
+```
+
+The parent parameter is optional.  If given, parent should be the fully
+qualified class name.  It will only return operators that derive from
+that class or interface. For example, if parent is
+com.datatorrent.api.InputOperator, this call will only return input
+operators provided by the Application Package.
+
+## Getting Properties of Operators in Application Package
+
+You can get the list of properties of any operator in the Application
+Package using this call.
+
+```
+GET  /ws/v2/appPackages/{owner}/{pkgName}/{pkgVersion}/operators/{className}
+```
+
+## Getting List of Pre-Set Configurations in Application Package
+
+You can get a list of pre-set configurations within the Application
+Package using this call.
+
+```
+GET /ws/v2/appPackages/{owner}/{pkgName}/{packageVersion}/configs
+```
+
+You can also get the content of a specific pre-set configuration within
+the Application Package.
+
+```
+ GET /ws/v2/appPackages/{owner}/{pkgName}/{pkgVersion}/configs/{configName}
+```
+
+## Changing Pre-Set Configurations in Application Package
+
+You can create or replace pre-set configurations within the Application
+Package
+```
+ PUT   /ws/v2/appPackages/{owner}/{pkgName}/{pkgVersion}/configs/{configName}
+```
+The payload of this PUT call is the XML file that represents the pre-set configuration.  The Content-Type of the payload is "application/xml" and you can delete a pre-set configuration within the Application Package.
+```
+ DELETE /ws/v2/appPackages/{owner}/{pkgName}/{pkgVersion}/configs/{configName}
+```
+
+## Retrieving an Application Package
+
+You can download the Application Package file.  This Application Package
+is not necessarily the same file as the one that was originally uploaded
+since the pre-set configurations may have been modified.
+
+```
+ GET /ws/v2/appPackages/{owner}/{pkgName}/{pkgVersion}/download
+```
+
+## Launching an Application Package
+
+You can launch an application within an Application Package.
+```
+POST /ws/v2/appPackages/{owner}/{pkgName}/{pkgVersion}/applications/{appName}/launch?config={configName}
+```
+
+The config parameter is optional.  If given, it must be one of the
+pre-set configuration within the given Application Package.  The
+Content-Type of the payload of the POST request is "application/json"
+and should contain the properties to be launched with the application.
+ It is of the form:
+
+```
+ {"property-name":"property-value", ... }
+```
+
+Here is an example of launching an application through curl:
+
+```
+ $ curl -XPOST -d'{"dt.operator.console.prop.stringFormat":"xyz %s"}'
+ http://localhost:9090/ws/v2/appPackages/dtadmin/mydtapp/1.0-SNAPSHOT/app
+ lications/MyFirstApplication/launch
+```
+
+Please refer to the [Gateway API reference](https://www.google.com/url?q=https://www.datatorrent.com/docs/guides/DTGatewayAPISpecification.html&sa=D&usg=AFQjCNEWfN7-e7fd6MoWZjmJUE3GW7UwdQ) for the complete specification of the REST API.
+
+# Examining and Launching Application Packages Through Apex CLI
+
+If you are working with Application Packages in the local filesystem and
+do not want to deal with dtGateway, you can use the Apex Command Line Interface (dtcli).  Please refer to the [Gateway API](dtgateway_api.md)
+to see samples for these commands.
+
+## Getting Application Package Meta Information
+
+You can get the meta information about the Application Package using
+this Apex CLI command.
+
+```
+ dt> get-app-package-info <app-package-file>
+```
+
+## Getting Available Operators In Application Package
+
+You can get the list of available operators in the Application Package
+using this command.
+
+```
+ dt> get-app-package-operators <app-package-file> <package-prefix>
+ [parent-class]
+```
+
+## Getting Properties of Operators in Application Package
+
+You can get the list of properties of any operator in the Application
+Package using this command.
+
+ dt> get-app-package-operator-properties <app-package-file> <operator-class>
+
+
+## Launching an Application Package
+
+You can launch an application within an Application Package.
+```
+dt> launch [-D property-name=property-value, ...] [-conf config-name]
+ [-apconf config-file-within-app-package] <app-package-file>
+ [matching-app-name]
+```
+Note that -conf expects a configuration file in the file system, while -apconf expects a configuration file within the app package.

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/autometrics.md
----------------------------------------------------------------------
diff --git a/docs/autometrics.md b/docs/autometrics.md
new file mode 100644
index 0000000..f6000e8
--- /dev/null
+++ b/docs/autometrics.md
@@ -0,0 +1,311 @@
+Apache Apex AutoMetrics
+=======================
+
+# Introduction
+Metrics collect various statistical information about a process which can be very useful for diagnosis. Auto Metrics in Apex can help monitor operators in a running application.  The goal of *AutoMetric* API is to enable operator developer to define relevant metrics for an operator in a simple way which the platform collects and reports automatically.
+
+# Specifying AutoMetrics in an Operator
+An *AutoMetric* can be any object. It can be of a primitive type - int, long, etc. or a complex one. A field or a `get` method in an operator can be annotated with `@AutoMetric` to specify that its value is a metric. After every application end window, the platform collects the values of these fields/methods in a map and sends it to application master.
+
+```java
+public class LineReceiver extends BaseOperator
+{
+ @AutoMetric
+ long length;
+
+ @AutoMetric
+ long count;
+
+ public final transient DefaultInputPort<String> input = new DefaultInputPort<String>()
+ {
+   @Override
+   public void process(String s)
+   {
+     length += s.length();
+     count++;
+   }
+ };
+
+ @Override
+ public void beginWindow(long windowId)
+ {
+   length = 0;
+   count = 0;
+ }
+}
+```
+
+There are 2 auto-metrics declared in the `LineReceiver`. At the end of each application window, the platform will send a map with 2 entries - `[(length, 100), (count, 10)]` to the application master.
+
+# Aggregating AutoMetrics across Partitions
+When an operator is partitioned, it is useful to aggregate the values of auto-metrics across all its partitions every window to get a logical view of these metrics. The application master performs these aggregations using metrics aggregators.
+
+The AutoMetric API helps to achieve this by providing an interface for writing aggregators- `AutoMetric.Aggregator`. Any implementation of `AutoMetric.Aggregator` can be set as an operator attribute - `METRICS_AGGREGATOR` for a particular operator which in turn is used for aggregating physical metrics.
+
+## Default aggregators
+[`MetricsAggregator`](https://github.com/apache/incubator-apex-core/blob/devel-3/common/src/main/java/com/datatorrent/common/metric/MetricsAggregator.java) is a simple implementation of `AutoMetric.Aggregator` that platform uses as a default for summing up primitive types - int, long, float and double.
+
+`MetricsAggregator` is just a collection of `SingleMetricAggregator`s. There are multiple implementations of `SingleMetricAggregator` that perform sum, min, max, avg which are present in Apex core and Apex malhar.
+
+For the `LineReceiver` operator, the application developer need not specify any aggregator. The platform will automatically inject an instance of `MetricsAggregator` that contains two `LongSumAggregator`s - one for `length` and one for `count`. This aggregator will report sum of length and sum of count across all the partitions of `LineReceiver`.
+
+
+## Building custom aggregators
+Platform cannot perform any meaningful aggregations for non-numeric metrics. In such cases, the operator or application developer can write custom aggregators. Let’s say, if the `LineReceiver` was modified to have a complex metric as shown below.
+
+```java
+public class AnotherLineReceiver extends BaseOperator
+{
+  @AutoMetric
+  final LineMetrics lineMetrics = new LineMetrics();
+
+  public final transient DefaultInputPort<String> input = new DefaultInputPort<String>()
+  {
+    @Override
+    public void process(String s)
+    {
+      lineMetrics.length += s.length();
+      lineMetrics.count++;
+    }
+  };
+
+  @Override
+  public void beginWindow(long windowId)
+  {
+    lineMetrics.length = 0;
+    lineMetrics.count = 0;
+  }
+
+  public static class LineMetrics implements Serializable
+  {
+    long length;
+    long count;
+
+    private static final long serialVersionUID = 201511041908L;
+  }
+}
+```
+
+Below is a custom aggregator that can calculate average line length across all partitions of `AnotherLineReceiver`.
+
+```java
+public class AvgLineLengthAggregator implements AutoMetric.Aggregator
+{
+
+  Map<String, Object> result = Maps.newHashMap();
+
+  @Override
+  public Map<String, Object> aggregate(long l, Collection<AutoMetric.PhysicalMetricsContext> collection)
+  {
+    long totalLength = 0;
+    long totalCount = 0;
+    for (AutoMetric.PhysicalMetricsContext pmc : collection) {
+      AnotherLineReceiver.LineMetrics lm = (AnotherLineReceiver.LineMetrics)pmc.getMetrics().get("lineMetrics");
+      totalLength += lm.length;
+      totalCount += lm.count;
+    }
+    result.put("avgLineLength", totalLength/totalCount);
+    return result;
+  }
+}
+```
+An instance of above aggregator can be specified as the `METRIC_AGGREGATOR` for `AnotherLineReceiver` while creating the DAG as shown below.
+
+```java
+  @Override
+  public void populateDAG(DAG dag, Configuration configuration)
+  {
+    ...
+    AnotherLineReceiver lineReceiver = dag.addOperator("LineReceiver", new AnotherLineReceiver());
+    dag.setAttribute(lineReceiver, Context.OperatorContext.METRICS_AGGREGATOR, new AvgLineLengthAggregator());
+    ...
+  }
+```
+
+# Retrieving AutoMetrics
+The Gateway REST API provides a way to retrieve the latest AutoMetrics for each logical operator.  For example:
+
+```
+GET /ws/v2/applications/{appid}/logicalPlan/operators/{opName}
+{
+    ...
+    "autoMetrics": {
+       "count": "71314",
+       "length": "27780706"
+    },
+    "className": "com.datatorrent.autometric.LineReceiver",
+    ...
+}
+```
+
+# System Metrics
+System metrics are standard operator metrics provided by the system.  Examples include:
+
+- processed tuples per second
+- emitted tuples per second
+- total tuples processed
+- total tuples emitted
+- latency
+- CPU percentage
+- failure count
+- checkpoint elapsed time
+
+The Gateway REST API provides a way to retrieve the latest values for all of the above for each of the logical operators in the application.
+
+```
+GET /ws/v2/applications/{appid}/logicalPlan/operators/{opName}
+{
+    ...
+    "cpuPercentageMA": "{cpuPercentageMA}",
+    "failureCount": "{failureCount}",
+    "latencyMA": "{latencyMA}",  
+    "totalTuplesEmitted": "{totalTuplesEmitted}",
+    "totalTuplesProcessed": "{totalTuplesProcessed}",
+    "tuplesEmittedPSMA": "{tuplesEmittedPSMA}",
+    "tuplesProcessedPSMA": "{tuplesProcessedPSMA}",
+    ...
+}
+```
+
+However, just like AutoMetrics, the Gateway only provides the latest metrics.  For historical metrics, we will need the help of App Data Tracker.
+
+# App Data Tracker
+As discussed above, STRAM aggregates the AutoMetrics from physical operators (partitions) to something that makes sense in one logical operator.  It pushes the aggregated AutoMetrics values using Websocket to the Gateway at every second along with system metrics for each operator.  Gateway relays the information to an application called App Data Tracker.  It is another Apex application that runs in the background and further aggregates the incoming values by time bucket and stores the values in HDHT.  It also allows the outside to retrieve the aggregated AutoMetrics and system metrics through websocket interface.
+
+![AppDataTracker](images/autometrics/adt.png)
+
+App Data Tracker is enabled by having these properties in dt-site.xml:
+
+```xml
+<property>
+  <name>dt.appDataTracker.enable</name>
+  <value>true</value>
+</property>
+<property>
+  <name>dt.appDataTracker.transport</name>
+  <value>builtin:AppDataTrackerFeed</value>
+</property>
+<property>
+  <name>dt.attr.METRICS_TRANSPORT</name>
+  <value>builtin:AppDataTrackerFeed</value>
+</property>
+```
+
+All the applications launched after the App Data Tracker is enabled will have metrics sent to it.
+
+**Note**: The App Data Tracker will be shown running in dtManage as a “system app”.  It will show up if the “show system apps” button is pressed.
+
+By default, the time buckets App Data Tracker aggregates upon are one minute, one hour and one day.  It can be overridden by changing the operator attribute `METRICS_DIMENSIONS_SCHEME`.
+
+Also by default, the app data tracker performs all these aggregations: SUM, MIN, MAX, AVG, COUNT, FIRST, LAST on all number metrics.  You can also override by changing the same operator attribute `METRICS_DIMENSIONS_SCHEME`, provided the custom aggregator is known to the App Data Tracker.  (See next section)
+
+# Custom Aggregator in App Data Tracker
+Custom aggregators allow you to do your own custom computation on statistics generated by any of your applications. In order to implement a Custom aggregator you have to do two things:
+
+1. Combining new inputs with the current aggregation
+2. Combining two aggregations together into one aggregation
+
+Let’s consider the case where we want to perform the following rolling average:
+
+Y_n = ½ * X_n + ½ * X_n-1 + ¼ * X_n-2 + ⅛ * X_n-3 +...
+
+This aggregation could be performed by the following Custom Aggregator:
+
+```java
+@Name("IIRAVG")
+public class AggregatorIIRAVG extends AbstractIncrementalAggregator
+{
+  ...
+
+  private void aggregateHelper(DimensionsEvent dest, DimensionsEvent src)
+  {
+    double[] destVals = dest.getAggregates().getFieldsDouble();
+    double[] srcVals = src.getAggregates().getFieldsDouble();
+
+    for (int index = 0; index < destLongs.length; index++) {
+      destVals[index] = .5 * destVals[index] + .5 * srcVals[index];
+    }
+  }
+
+  @Override
+  public void aggregate(Aggregate dest, InputEvent src)
+  {
+    //Aggregate a current aggregation with a new input
+    aggregateHelper(dest, src);
+  }
+
+  @Override
+  public void aggregate(Aggregate destAgg, Aggregate srcAgg)
+  {
+    //Combine two existing aggregations together
+    aggregateHelper(destAgg, srcAgg);
+  }
+}
+```
+
+## Discovery of Custom Aggregators
+AppDataTracker searches for custom aggregator jars under the following directories statically before launching:
+
+1. {dt\_installation\_dir}/plugin/aggregators
+2. {user\_home\_dir}/.dt/plugin/aggregators
+
+It uses reflection to find all the classes that extend from `IncrementalAggregator` and `OTFAggregator` in these jars and registers them with the name provided by `@Name` annotation (or class name when `@Name` is absent).
+
+# Using `METRICS_DIMENSIONS_SCHEME`
+
+Here is a sample code snippet on how you can make use of `METRICS_DIMENSIONS_SCHEME` to set your own time buckets and your own set of aggregators for certain `AutoMetric`s performed by the App Data Tracker in your application.
+
+```java
+  @Override
+  public void populateDAG(DAG dag, Configuration configuration)
+  {
+    ...
+    LineReceiver lineReceiver = dag.addOperator("LineReceiver", new LineReceiver());
+    ...
+    AutoMetric.DimensionsScheme dimensionsScheme = new AutoMetric.DimensionsScheme()
+    {
+      String[] timeBuckets = new String[] { "1s", "1m", "1h" };
+      String[] lengthAggregators = new String[] { "IIRAVG", "SUM" };
+      String[] countAggregators = new String[] { "SUM" };
+
+      /* Setting the aggregation time bucket to be one second, one minute and one hour */
+      @Override
+      public String[] getTimeBuckets()
+      {
+        return timeBuckets;
+      }
+
+      @Override
+      public String[] getDimensionAggregationsFor(String logicalMetricName)
+      {
+        if ("length".equals(logicalMetricName)) {
+          return lengthAggregators;
+        } else if ("count".equals(logicalMetricName)) {
+          return countAggregators;
+        } else {
+          return null; // use default
+        }
+      }
+    };
+
+    dag.setAttribute(lineReceiver, OperatorContext.METRICS_DIMENSIONS_SCHEME, dimensionsScheme);
+    ...
+  }
+```
+
+
+# Dashboards
+With App Data Tracker enabled, you can visualize the AutoMetrics and system metrics in the Dashboards within dtManage.   Refer back to the diagram in the App Data Tracker section, dtGateway relays queries and query results to and from the App Data Tracker.  In this way, dtManage sends queries and receives results from the App Data Tracker via dtGateway and uses the results to let the user visualize the data.
+
+Click on the visualize button in dtManage's application page.
+
+![AppDataTracker](images/autometrics/visualize.png)
+
+You will see the dashboard for the AutoMetrics and the system metrics.
+
+![AppDataTracker](images/autometrics/dashboard.png)
+
+The left widget shows the AutoMetrics of `line` and `count` for the LineReceiver operator.  The right widget shows the system metrics.
+
+The Dashboards have some simple builtin widgets to visualize the data.  Line charts and bar charts are some examples.
+Users will be able to implement their own widgets to visualize their data.

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/configuration_packages.md
----------------------------------------------------------------------
diff --git a/docs/configuration_packages.md b/docs/configuration_packages.md
new file mode 100644
index 0000000..30f1717
--- /dev/null
+++ b/docs/configuration_packages.md
@@ -0,0 +1,242 @@
+Apache Apex Configuration Packages
+==================================
+
+An Apache Apex Application Configuration Package is a zip file that contains
+configuration files and additional files to be launched with an
+[Application Package](application_packages.md) using 
+DTCLI or REST API.  This guide assumes the reader’s familiarity of
+Application Package.  Please read the Application Package document to
+get yourself familiar with the concept first if you have not done so.
+
+#Requirements 
+
+You will need have the following installed:
+
+1. Apache Maven 3.0 or later (for assembling the Config Package)
+2. Apex 3.0.0 or later (for launching the App Package with the Config
+    Package in your cluster)
+
+#Creating Your First Configuration Package 
+
+You can create a Configuration Package using your Linux command line, or
+using your favorite IDE.  
+
+## Using Command Line
+
+First, change to the directory where you put your projects, and create a
+DT configuration project using Maven by running the following command.
+ Replace "com.example", "mydtconfig" and "1.0-SNAPSHOT" with the
+appropriate values:
+
+    $ mvn archetype:generate \
+     -DarchetypeGroupId=org.apache.apex \
+     -DarchetypeArtifactId=apex-conf-archetype -DarchetypeVersion=3.2.0-incubating \
+     -DgroupId=com.example -Dpackage=com.example.mydtconfig -DartifactId=mydtconfig \
+     -Dversion=1.0-SNAPSHOT
+
+This creates a Maven project named "mydtconfig". Open it with your
+favorite IDE (e.g. NetBeans, Eclipse, IntelliJ IDEA).  Try it out by
+running the following command:
+
+```
+$ mvn package                                                         
+```
+
+The "mvn package" command creates the Config Package file in target
+directory as target/mydtconfig.apc. You will be able to use that
+Configuration Package file to launch an Apache Apex application.
+
+## Using IDE 
+
+Alternatively, you can do the above steps all within your IDE.  For
+example, in NetBeans, select File -\> New Project.  Then choose “Maven”
+and “Project from Archetype” in the dialog box, as shown.
+
+![](images/AppConfig/ApplicationConfigurationPackages.html-image01.png)
+
+Then fill the Group ID, Artifact ID, Version and Repository entries as
+shown below.
+
+![](images/AppConfig/ApplicationConfigurationPackages.html-image02.png)
+
+Group ID: org.apache.apex
+Artifact ID: apex-conf-archetype
+Version: 3.2.0-incubating (or any later version)
+
+Press Next and fill out the rest of the required information. For
+example:
+
+![](images/AppConfig/ApplicationConfigurationPackages.html-image00.png)
+
+Click Finish, and now you have created your own Apex
+Configuration Package project.  The procedure for other IDEs, like
+Eclipse or IntelliJ, is similar.
+
+
+# Assembling your own configuration package 
+
+Inside the project created by the archetype, these are the files that
+you should know about when assembling your own configuration package:
+
+    ./pom.xml
+    ./src/main/resources/classpath
+    ./src/main/resources/files
+    ./src/main/resources/META-INF/properties.xml
+    ./src/main/resources/META-INF/properties-{appname}.xml
+
+## pom.xml 
+
+Example:
+
+```
+  <groupId>com.example</groupId>
+  <version>1.0.0</version>
+  <artifactId>mydtconf</artifactId>
+  <packaging>jar</packaging>
+  <!-- change these to the appropriate values -->
+  <name>My DataTorrent Application Configuration</name>
+  <description>My DataTorrent Application Configuration Description</description>
+  <properties>
+    <datatorrent.apppackage.name>mydtapp</datatorrent.apppackage.name>
+    <datatorrent.apppackage.minversion>1.0.0</datatorrent.apppackage.minversion>
+   <datatorrent.apppackage.maxversion>1.9999.9999</datatorrent.apppackage.maxversion>
+    <datatorrent.appconf.classpath>classpath/*</datatorrent.appconf.classpath>
+    <datatorrent.appconf.files>files/*</datatorrent.appconf.files>
+  </properties> 
+
+```
+In pom.xml, you can change the following keys to your desired values
+
+* ```<groupId>```
+* ```<version>```
+* ```<artifactId>```
+* ```<name> ```
+* ```<description>```
+
+You can also change the values of 
+
+* ```<datatorrent.apppackage.name>```
+* ```<datatorrent.apppackage.minversion>```
+* ```<datatorrent.apppackage.maxversion>```
+
+to reflect what app packages should be used with this configuration package.  Apex will use this information to check whether a
+configuration package is compatible with the application package when you issue a launch command.
+
+## ./src/main/resources/classpath 
+
+Place any file in this directory that you’d like to be copied to the
+compute machines when launching an application and included in the
+classpath of the application.  Example of such files are Java properties
+files and jar files.
+
+## ./src/main/resources/files 
+
+Place any file in this directory that you’d like to be copied to the
+compute machines when launching an application but not included in the
+classpath of the application.
+
+## Properties XML file
+
+A properties xml file consists of a set of key-value pairs.  The set of
+key-value pairs specifies the configuration options the application
+should be launched with.  
+
+Example:
+```
+<configuration>
+  <property>
+    <name>some-property-name</name>
+    <value>some-property-value</value>
+  </property>
+   ...
+</configuration>
+```
+Names of properties XML file:
+
+*  **properties.xml:** Properties that are global to the Configuration
+Package
+*  **properties-{appName}.xml:** Properties that are specific when launching
+an application with the specified appName within the Application
+Package.
+
+After you are done with the above, remember to do mvn package to
+generate a new configuration package, which will be located in the
+target directory in your project.
+
+## Zip structure of configuration package 
+Apex Application Configuration Package files are zip files.  You
+can examine the content of any Application Configuration Package by
+using unzip -t on your Linux command line.  The structure of the zip
+file is as follow:
+
+```
+META-INF
+  MANIFEST.MF
+  properties.xml
+  properties-{appname}.xml
+classpath
+  {classpath files}
+files
+  {files} 
+```
+
+
+
+#Launching with CLI 
+
+`-conf` option of the launch command in CLI supports specifying configuration package in the local filesystem.  Example:
+
+    dt\> launch DTApp-mydtapp-1.0.0.jar -conf DTConfig-mydtconfig-1.0.0.jar
+
+This command expects both the application package and the configuration package to be in the local file system.
+
+
+
+# Related REST API 
+
+### POST /ws/v2/configPackages
+
+Payload: Raw content of configuration package zip
+
+Function: Creates or replace a configuration package zip file in HDFS
+
+Curl example:
+
+    $ curl -XPOST -T DTConfig-{name}.jar http://{yourhost:port}/ws/v2/configPackages
+
+### GET /ws/v2/configPackages?appPackageName=...&appPackageVersion=... 
+
+All query parameters are optional
+
+Function: Returns the configuration packages that the user is authorized to use and that are compatible with the specified appPackageName, appPackageVersion and appName. 
+
+### GET /ws/v2/configPackages/``<user>``?appPackageName=...&appPackageVersion=... 
+
+All query parameters are optional
+
+Function: Returns the configuration packages under the specified user and that are compatible with the specified appPackageName, appPackageVersion and appName.
+
+### GET /ws/v2/configPackages/```<user>```/```<name>``` 
+
+Function: Returns the information of the specified configuration package
+
+### GET /ws/v2/configPackages/```<user>```/```<name>```/download 
+
+Function: Returns the raw config package file
+
+Curl example:
+
+```sh
+$ curl http://{yourhost:port}/ws/v2/configPackages/{user}/{name}/download \> DTConfig-xyz.jar
+$ unzip -t DTConfig-xyz.jar
+```
+
+### POST /ws/v2/appPackages/```<user>```/```<app-pkg-name>```/```<app-pkg-version>```/applications/{app-name}/launch?configPackage=```<user>```/```<confpkgname>```
+
+Function: Launches the app package with the specified configuration package stored in HDFS.
+
+Curl example:
+
+```sh
+$ curl -XPOST -d ’{}’ http://{yourhost:port}/ws/v2/appPackages/{user}/{app-pkg-name}/{app-pkg-version}/applications/{app-name}/launch?configPackage={user}/{confpkgname}
+```
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/dtcli.md
----------------------------------------------------------------------
diff --git a/docs/dtcli.md b/docs/dtcli.md
new file mode 100644
index 0000000..813a27f
--- /dev/null
+++ b/docs/dtcli.md
@@ -0,0 +1,273 @@
+Apache Apex Command Line Interface
+================================================================================
+
+dtCli, the Apache Apex command line interface, can be used to launch, monitor, and manage
+Apache Apex applications.  dtCli is a wrapper around the [REST API](dtgateway_api.md) provided by dtGatway, and
+provides a developer friendly way of interacting with Apache Apex platform. The CLI enables a much higher level of feature set by
+hiding deep details of REST API.  Another advantage of dtCli is to provide scope, by connecting and executing commands in a context
+of specific application.  dtCli enables easy integration with existing enterprise toolset for automated application monitoring
+and management.  Currently the following high level tasks are supported.
+
+-   Launch or kill applications
+-   View system metrics including load, throughput, latency, etc.
+-   Start or stop tuple recording
+-   Read operator, stream, port properties and attributes
+-   Write to operator properties
+-   Dynamically change the application logical plan
+-   Create custom macros
+
+
+## dtcli Commands
+
+dtCli can be launched by running following command on the same machine where dtGatway was installed
+
+    dtcli
+
+Help on all commands is available via “help” command in the CLI
+
+### Global Commands
+
+```
+GLOBAL COMMANDS EXCEPT WHEN CHANGING LOGICAL PLAN:
+
+alias alias-name command
+	Create a command alias
+
+begin-macro name
+	Begin Macro Definition ($1...$9 to access parameters and type 'end' to end the definition)
+
+connect app-id
+	Connect to an app
+
+dump-properties-file out-file jar-file class-name
+	Dump the properties file of an app class
+
+echo [arg ...]
+	Echo the arguments
+
+exit
+	Exit the CLI
+
+get-app-info app-id
+	Get the information of an app
+
+get-app-package-info app-package-file
+	Get info on the app package file
+
+get-app-package-operator-properties app-package-file operator-class
+	Get operator properties within the given app package
+
+get-app-package-operators [options] app-package-file [search-term]
+	Get operators within the given app package
+	Options:
+            -parent    Specify the parent class for the operators
+
+get-config-parameter [parameter-name]
+	Get the configuration parameter
+
+get-jar-operator-classes [options] jar-files-comma-separated [search-term]
+	List operators in a jar list
+	Options:
+            -parent    Specify the parent class for the operators
+
+get-jar-operator-properties jar-files-comma-separated operator-class-name
+	List properties in specified operator
+
+help [command]
+	Show help
+
+kill-app app-id [app-id ...]
+	Kill an app
+
+  launch [options] jar-file/json-file/properties-file/app-package-file [matching-app-name]
+  	Launch an app
+  	Options:
+            -apconf <app package configuration file>        Specify an application
+                                                            configuration file
+                                                            within the app
+                                                            package if launching
+                                                            an app package.
+            -archives <comma separated list of archives>    Specify comma
+                                                            separated archives
+                                                            to be unarchived on
+                                                            the compute machines.
+            -conf <configuration file>                      Specify an
+                                                            application
+                                                            configuration file.
+            -D <property=value>                             Use value for given
+                                                            property.
+            -exactMatch                                     Only consider
+                                                            applications with
+                                                            exact app name
+            -files <comma separated list of files>          Specify comma
+                                                            separated files to
+                                                            be copied on the
+                                                            compute machines.
+            -ignorepom                                      Do not run maven to
+                                                            find the dependency
+            -libjars <comma separated list of libjars>      Specify comma
+                                                            separated jar files
+                                                            or other resource
+                                                            files to include in
+                                                            the classpath.
+            -local                                          Run application in
+                                                            local mode.
+            -originalAppId <application id>                 Specify original
+                                                            application
+                                                            identifier for restart.
+            -queue <queue name>                             Specify the queue to
+                                                            launch the application
+
+list-application-attributes
+	Lists the application attributes
+list-apps [pattern]
+	List applications
+list-operator-attributes
+	Lists the operator attributes
+list-port-attributes
+	Lists the port attributes
+set-pager on/off
+	Set the pager program for output
+show-logical-plan [options] jar-file/app-package-file [class-name]
+	List apps in a jar or show logical plan of an app class
+	Options:
+            -exactMatch                                Only consider exact match
+                                                       for app name
+            -ignorepom                                 Do not run maven to find
+                                                       the dependency
+            -libjars <comma separated list of jars>    Specify comma separated
+                                                       jar/resource files to
+                                                       include in the classpath.
+shutdown-app app-id [app-id ...]
+	Shutdown an app
+source file
+	Execute the commands in a file
+```
+
+### Commands after connecting to an application
+
+```
+COMMANDS WHEN CONNECTED TO AN APP (via connect <appid>) EXCEPT WHEN CHANGING LOGICAL PLAN:
+
+begin-logical-plan-change
+	Begin Logical Plan Change
+dump-properties-file out-file [jar-file] [class-name]
+	Dump the properties file of an app class
+get-app-attributes [attribute-name]
+	Get attributes of the connected app
+get-app-info [app-id]
+	Get the information of an app
+get-operator-attributes operator-name [attribute-name]
+	Get attributes of an operator
+get-operator-properties operator-name [property-name]
+	Get properties of a logical operator
+get-physical-operator-properties [options] operator-id
+	Get properties of a physical operator
+	Options:
+            -propertyName <property name>    The name of the property whose
+                                             value needs to be retrieved
+            -waitTime <wait time>            How long to wait to get the result
+get-port-attributes operator-name port-name [attribute-name]
+	Get attributes of a port
+get-recording-info [operator-id] [start-time]
+	Get tuple recording info
+kill-app [app-id ...]
+	Kill an app
+kill-container container-id [container-id ...]
+	Kill a container
+list-containers
+	List containers
+list-operators [pattern]
+	List operators
+set-operator-property operator-name property-name property-value
+	Set a property of an operator
+set-physical-operator-property operator-id property-name property-value
+	Set a property of an operator
+show-logical-plan [options] [jar-file/app-package-file] [class-name]
+	Show logical plan of an app class
+	Options:
+            -exactMatch                                Only consider exact match
+                                                       for app name
+            -ignorepom                                 Do not run maven to find
+                                                       the dependency
+            -libjars <comma separated list of jars>    Specify comma separated
+                                                       jar/resource files to
+                                                       include in the classpath.
+show-physical-plan
+	Show physical plan
+shutdown-app [app-id ...]
+	Shutdown an app
+start-recording operator-id [port-name] [num-windows]
+	Start recording
+stop-recording operator-id [port-name]
+	Stop recording
+wait timeout
+	Wait for completion of current application
+```
+
+### Commands when changing the logical plan
+
+```
+COMMANDS WHEN CHANGING LOGICAL PLAN (via begin-logical-plan-change):
+
+abort
+	Abort the plan change
+add-stream-sink stream-name to-operator-name to-port-name
+	Add a sink to an existing stream
+create-operator operator-name class-name
+	Create an operator
+create-stream stream-name from-operator-name from-port-name to-operator-name to-port-name
+	Create a stream
+help [command]
+	Show help
+remove-operator operator-name
+	Remove an operator
+remove-stream stream-name
+	Remove a stream
+set-operator-attribute operator-name attr-name attr-value
+	Set an attribute of an operator
+set-operator-property operator-name property-name property-value
+	Set a property of an operator
+set-port-attribute operator-name port-name attr-name attr-value
+	Set an attribute of a port
+set-stream-attribute stream-name attr-name attr-value
+	Set an attribute of a stream
+show-queue
+	Show the queue of the plan change
+submit
+	Submit the plan change
+```
+
+
+
+## Examples
+
+An example of defining a custom macro.  The macro updates a running application by inserting a new operator.  It takes three parameters and executes a logical plan changes.
+
+```
+dt> begin-macro add-console-output
+macro> begin-logical-plan-change
+macro> create-operator $1 com.datatorrent.lib.io.ConsoleOutputOperator
+macro> create-stream stream_$1 $2 $3 $1 in
+macro> submit
+```
+
+
+Then execute the `add-console-output` macro like this
+
+```
+dt> add-console-output xyz opername portname
+```
+
+This macro then expands to run the following command
+
+```
+begin-logical-plan-change
+create-operator xyz com.datatorrent.lib.io.ConsoleOutputOperator
+create-stream stream_xyz opername portname xyz in
+submit
+```
+
+
+*Note*:  To perform runtime logical plan changes, like ability to add new operators,
+they must be part of the jar files that were deployed at application launch time.

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/images/AppConfig/ApplicationConfigurationPackages.html-image00.png
----------------------------------------------------------------------
diff --git a/docs/images/AppConfig/ApplicationConfigurationPackages.html-image00.png b/docs/images/AppConfig/ApplicationConfigurationPackages.html-image00.png
new file mode 100644
index 0000000..30ad3e4
Binary files /dev/null and b/docs/images/AppConfig/ApplicationConfigurationPackages.html-image00.png differ

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/images/AppConfig/ApplicationConfigurationPackages.html-image01.png
----------------------------------------------------------------------
diff --git a/docs/images/AppConfig/ApplicationConfigurationPackages.html-image01.png b/docs/images/AppConfig/ApplicationConfigurationPackages.html-image01.png
new file mode 100644
index 0000000..5b3623d
Binary files /dev/null and b/docs/images/AppConfig/ApplicationConfigurationPackages.html-image01.png differ

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/images/AppConfig/ApplicationConfigurationPackages.html-image02.png
----------------------------------------------------------------------
diff --git a/docs/images/AppConfig/ApplicationConfigurationPackages.html-image02.png b/docs/images/AppConfig/ApplicationConfigurationPackages.html-image02.png
new file mode 100644
index 0000000..65a8aee
Binary files /dev/null and b/docs/images/AppConfig/ApplicationConfigurationPackages.html-image02.png differ

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/images/MalharOperatorOverview.png
----------------------------------------------------------------------
diff --git a/docs/images/MalharOperatorOverview.png b/docs/images/MalharOperatorOverview.png
new file mode 100644
index 0000000..40bee4a
Binary files /dev/null and b/docs/images/MalharOperatorOverview.png differ

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/images/apex_logo.png
----------------------------------------------------------------------
diff --git a/docs/images/apex_logo.png b/docs/images/apex_logo.png
new file mode 100644
index 0000000..baa25ca
Binary files /dev/null and b/docs/images/apex_logo.png differ

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/images/application_development/ApplicationDeveloperGuide.html-image00.png
----------------------------------------------------------------------
diff --git a/docs/images/application_development/ApplicationDeveloperGuide.html-image00.png b/docs/images/application_development/ApplicationDeveloperGuide.html-image00.png
new file mode 100644
index 0000000..87ebce4
Binary files /dev/null and b/docs/images/application_development/ApplicationDeveloperGuide.html-image00.png differ

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/images/application_development/ApplicationDeveloperGuide.html-image01.png
----------------------------------------------------------------------
diff --git a/docs/images/application_development/ApplicationDeveloperGuide.html-image01.png b/docs/images/application_development/ApplicationDeveloperGuide.html-image01.png
new file mode 100644
index 0000000..4cdee33
Binary files /dev/null and b/docs/images/application_development/ApplicationDeveloperGuide.html-image01.png differ

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/images/application_development/ApplicationDeveloperGuide.html-image02.png
----------------------------------------------------------------------
diff --git a/docs/images/application_development/ApplicationDeveloperGuide.html-image02.png b/docs/images/application_development/ApplicationDeveloperGuide.html-image02.png
new file mode 100644
index 0000000..5bf041c
Binary files /dev/null and b/docs/images/application_development/ApplicationDeveloperGuide.html-image02.png differ

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/images/application_development/ApplicationDeveloperGuide.html-image03.png
----------------------------------------------------------------------
diff --git a/docs/images/application_development/ApplicationDeveloperGuide.html-image03.png b/docs/images/application_development/ApplicationDeveloperGuide.html-image03.png
new file mode 100644
index 0000000..e00bba5
Binary files /dev/null and b/docs/images/application_development/ApplicationDeveloperGuide.html-image03.png differ

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/images/application_development/ApplicationDeveloperGuide.html-image04.png
----------------------------------------------------------------------
diff --git a/docs/images/application_development/ApplicationDeveloperGuide.html-image04.png b/docs/images/application_development/ApplicationDeveloperGuide.html-image04.png
new file mode 100644
index 0000000..8f62361
Binary files /dev/null and b/docs/images/application_development/ApplicationDeveloperGuide.html-image04.png differ

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/images/application_development/ApplicationDeveloperGuide.html-image05.png
----------------------------------------------------------------------
diff --git a/docs/images/application_development/ApplicationDeveloperGuide.html-image05.png b/docs/images/application_development/ApplicationDeveloperGuide.html-image05.png
new file mode 100644
index 0000000..f9ea8d9
Binary files /dev/null and b/docs/images/application_development/ApplicationDeveloperGuide.html-image05.png differ

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/images/application_development/ApplicationDeveloperGuide.html-image06.png
----------------------------------------------------------------------
diff --git a/docs/images/application_development/ApplicationDeveloperGuide.html-image06.png b/docs/images/application_development/ApplicationDeveloperGuide.html-image06.png
new file mode 100644
index 0000000..346690c
Binary files /dev/null and b/docs/images/application_development/ApplicationDeveloperGuide.html-image06.png differ

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/images/application_development/ApplicationDeveloperGuide.html-image07.png
----------------------------------------------------------------------
diff --git a/docs/images/application_development/ApplicationDeveloperGuide.html-image07.png b/docs/images/application_development/ApplicationDeveloperGuide.html-image07.png
new file mode 100644
index 0000000..e57a8eb
Binary files /dev/null and b/docs/images/application_development/ApplicationDeveloperGuide.html-image07.png differ

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/images/application_development/ApplicationDeveloperGuide.html-image08.png
----------------------------------------------------------------------
diff --git a/docs/images/application_development/ApplicationDeveloperGuide.html-image08.png b/docs/images/application_development/ApplicationDeveloperGuide.html-image08.png
new file mode 100644
index 0000000..a363f94
Binary files /dev/null and b/docs/images/application_development/ApplicationDeveloperGuide.html-image08.png differ

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/images/application_development/ApplicationDeveloperGuide.html-image09.png
----------------------------------------------------------------------
diff --git a/docs/images/application_development/ApplicationDeveloperGuide.html-image09.png b/docs/images/application_development/ApplicationDeveloperGuide.html-image09.png
new file mode 100644
index 0000000..8a0252b
Binary files /dev/null and b/docs/images/application_development/ApplicationDeveloperGuide.html-image09.png differ

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/images/autometrics/adt.png
----------------------------------------------------------------------
diff --git a/docs/images/autometrics/adt.png b/docs/images/autometrics/adt.png
new file mode 100644
index 0000000..187bbd4
Binary files /dev/null and b/docs/images/autometrics/adt.png differ

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/images/autometrics/dashboard.png
----------------------------------------------------------------------
diff --git a/docs/images/autometrics/dashboard.png b/docs/images/autometrics/dashboard.png
new file mode 100644
index 0000000..c4ebb39
Binary files /dev/null and b/docs/images/autometrics/dashboard.png differ

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/images/autometrics/visualize.png
----------------------------------------------------------------------
diff --git a/docs/images/autometrics/visualize.png b/docs/images/autometrics/visualize.png
new file mode 100644
index 0000000..fb2e780
Binary files /dev/null and b/docs/images/autometrics/visualize.png differ

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/images/operator/image00.png
----------------------------------------------------------------------
diff --git a/docs/images/operator/image00.png b/docs/images/operator/image00.png
new file mode 100644
index 0000000..14588aa
Binary files /dev/null and b/docs/images/operator/image00.png differ

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/images/operator/image01.png
----------------------------------------------------------------------
diff --git a/docs/images/operator/image01.png b/docs/images/operator/image01.png
new file mode 100644
index 0000000..626c6b5
Binary files /dev/null and b/docs/images/operator/image01.png differ

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/images/operator/image02.png
----------------------------------------------------------------------
diff --git a/docs/images/operator/image02.png b/docs/images/operator/image02.png
new file mode 100644
index 0000000..2be9433
Binary files /dev/null and b/docs/images/operator/image02.png differ

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/images/operator/image03.png
----------------------------------------------------------------------
diff --git a/docs/images/operator/image03.png b/docs/images/operator/image03.png
new file mode 100644
index 0000000..67802d8
Binary files /dev/null and b/docs/images/operator/image03.png differ

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/images/operator/image04.png
----------------------------------------------------------------------
diff --git a/docs/images/operator/image04.png b/docs/images/operator/image04.png
new file mode 100644
index 0000000..58d99d9
Binary files /dev/null and b/docs/images/operator/image04.png differ

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/images/operator/image05.png
----------------------------------------------------------------------
diff --git a/docs/images/operator/image05.png b/docs/images/operator/image05.png
new file mode 100644
index 0000000..9ac6f21
Binary files /dev/null and b/docs/images/operator/image05.png differ

http://git-wip-us.apache.org/repos/asf/incubator-apex-core/blob/44f220fd/docs/operator_development.md
----------------------------------------------------------------------
diff --git a/docs/operator_development.md b/docs/operator_development.md
new file mode 100644
index 0000000..f502725
--- /dev/null
+++ b/docs/operator_development.md
@@ -0,0 +1,449 @@
+Operator Development Guide
+==========================
+
+Operators are basic building blocks of an application built to run on
+Apache Apex platform. An application may consist of one or more
+operators each of which define some logical operation to be done on the
+tuples arriving at the operator. These operators are connected together
+using streams forming a Directed Acyclic Graph (DAG). In other words, a streaming
+application is represented by a DAG that consists of operations (called operators) and
+data flow (called streams).
+
+In this document we will discuss details on how an operator works and
+its internals. This document is intended to serve the following purposes
+
+1.  **[Apache Apex Operators](#apex_operators)** - Introduction to operator terminology and concepts.
+2.  **[Writing Custom Operators](#writing_custom_operators)** - Designing, coding and testing new operators from scratch.  Includes code examples.
+3.  **[Operator Reference](#operator_reference)** - Details of operator internals, lifecycle, and best practices and optimizations.
+
+* * * * *
+
+Apache Apex Operators <a name="apex_operators"></a>
+==========================================
+
+Operators - “What” in a nutshell
+--------------------------------
+
+Operators are independent units of logical operations which can
+contribute in executing the business logic of a use case. For example,
+in an ETL workflow, a filtering operation can be represented by a single
+operator. This filtering operator will be responsible for doing just one
+task in the ETL pipeline, i.e. filter incoming tuples. Operators do not
+impose any restrictions on what can or cannot be done as part of a
+operator. An operator may as well contain the entire business logic.
+However, it is recommended, that the operators are light weight
+independent tasks, in
+order to take advantage of the distributed framework that Apache Apex
+provides. The structure of a streaming application shares resemblance
+with the way CPU pipelining works. CPU pipelining breaks down the
+computation engine into different stages viz. instruction fetch,
+instruction decode, etc. so that each of them can perform their task on
+different instructions
+parallely. Similarly,
+Apache Apex APIs allow the user to break down their tasks into different
+stages so that all of the tasks can be executed on different tuples
+parallely.
+
+![](images/operator/image05.png)
+
+Operators - “How” in a nutshell
+-------------------------------
+
+An Apache Apex application runs as a YARN application. Hence, each of
+the operators that the application DAG contains, runs in one of the
+containers provisioned by YARN. Further, Apache Apex exposes APIs to
+allow the user to request bundling multiple operators in a single node,
+a single container or even a single thread. We shall look at these calls
+in the reference sections [cite reference sections]. For now, consider
+an operator as some piece of code that runs on some machine of a YARN
+cluster.
+
+Types of Operators
+------------------
+
+An operator works on one tuple at a time. These tuples may be supplied
+by other operators in the application or by external sources,
+such as a database or a message bus. Similarly, after the tuples are
+processed, these may be passed on to other operators, or stored into an external system. 
+Therea are 3 type of operators based on function: 
+
+1.  **Input Adapter** - This is one of the starting points in
+    the application DAG and is responsible for getting tuples from an
+    external system. At the same time, such data may also be generated
+    by the operator itself, without interacting with the outside
+    world. These input tuples will form the initial universe of
+    data that the application works on.
+2.  **Generic Operator** - This type of operator accepts input tuples from
+    the previous operators and passes them on to the following operators
+    in the DAG.
+3.  **Output Adapter** - This is one of the ending points in the application
+    DAG and is responsible for writing the data out to some external
+    system.
+
+Note: There can be multiple operators of all types in an application
+DAG.
+
+Operators Position in a DAG
+-----------------------------------
+
+We may refer to operators depending on their position with respect to
+one another. For any operator opr (see image below), there are two types of operators.
+
+1.  **Upstream operators** - These are the operators from which there is a
+    directed path to opr in the application DAG.
+2.  **Downstream operators** - These are the operators to which there is a
+    directed path from opr in the application DAG.
+
+Note that there are no cycles formed in the application DAG.
+
+![](images/operator/image00.png)
+
+Ports
+-----
+
+Operators in a DAG are connected together via directed flows
+called streams. Each stream has end-points located on the operators
+called ports. Therea are 2 types of ports.
+
+1.  **Input Port** - This is a port through which an operator accepts input
+    tuples from an upstream operator.
+2.  **Output port** - This is a port through which an operator passes on the
+    processed data to downstream operators.
+
+Looking at the number of input ports, an Input Adapter is an operator
+with no input ports, a Generic operator has both input and output ports,
+while an Output Adapter has no output ports. At the same time, note that
+an operator may act as an Input Adapter while at the same time have an
+input port. In such cases, the operator is getting data from two
+different sources, viz. the input stream from the input port and an
+external source.
+
+![](images/operator/image02.png)
+
+* * * * *
+
+How Operator Works
+----------------------
+
+An operator passes through various stages during its lifetime. Each
+stage is an API call that the Streaming Application Master makes for an
+operator.  The following figure illustrates the stages through which an
+operator passes.
+
+![](images/operator/image01.png)
+
+-   The _setup()_ call initializes the operator and prepares itself to
+    start processing tuples.
+-   The _beginWindow()_ call marks the beginning of an application window
+    and allows for any processing to be done before a window starts.
+-   The _process()_ call belongs to the _InputPort_ and gets triggered when
+    any tuple arrives at the Input port of the operator. This call is
+    specific only to Generic and Output adapters, since Input Adapters
+    do not have an input port. This is made for all the tuples at the
+    input port until the end window marker tuple is received on the
+    input port.
+-   The _emitTuples()_ is the counterpart of _process()_ call for Input
+    Adapters.
+    This call is used by Input adapters to emit any tuples that are
+    fetched from the external systems, or generated by the operator.
+    This method is called continuously until the pre-configured window
+    time is elapsed, at which the end window marker tuple is sent out on
+    the output port.
+-   The _endWindow()_ call marks the end of the window and allows for any
+    processing to be done after the window ends.
+-   The _teardown()_ call is used for gracefully shutting down the
+    operator and releasing any resources held by the operator.
+
+Developing Custom Operators <a name="writing_custom_operators"></a>
+====================================
+
+About this tutorial
+-------------------
+
+This tutorial will guide the user towards developing a operator from
+scratch. It includes all aspects of writing an operator including
+design, code and unit testing.
+
+Introduction
+------------
+
+In this tutorial, we will design and write, from scratch, an operator
+called Word Count. This operator will accept tuples of type String,
+count the number of occurrences for each word appearing in the tuple and
+send out the updated counts for all the words encountered in the tuple.
+Further, the operator will also accept a file path on HDFS which will
+contain the stop-words which need to be ignored when counting
+occurrences.
+
+Design
+------
+
+Design of the operator must be finalized before starting to write an
+operator. Many aspects including the functionality, the data sources,
+the types involved etc. need to be first finalized before writing the
+operator. Let us dive into each of these while considering the Word
+Count operator.
+
+### Functionality
+
+We can define the scope of operator functionality using the following
+tasks:
+
+1.  Parse the input tuple to identify the words in the tuple
+2.  Identify the stop-words in the tuple by looking up the stop-word
+    file as configured
+3.  For each non-stop-word in the tuple, count the occurrences in that
+    tuple and add it to a global counts
+
+Let’s consider an example. Suppose we have the following tuples flow
+into the Word Count operator.
+
+1.  _Humpty dumpty sat on a wall_
+2.  _Humpty dumpty had a great fall_
+
+Initially counts for all words is 0. Once the first tuple is processed,
+the counts that must be emitted are:
+
+``` java
+humpty - 1
+dumpty - 1
+sat - 1
+wall - 1
+```
+
+Note that we are ignoring the stop-words, “on” and “a” in this case.
+Also note that as a rule, we’ll ignore the case of the words when
+counting occurrences.
+
+Similarly, after the second tuple is processed, the counts that must be
+emitted are:
+
+``` java
+humpty - 2
+dumpty - 2
+great - 1
+fall - 1
+```
+
+Again, we ignore the words _“had”_ and _“a”_ since these are stop-words.
+
+Note that the most recent count for any word is correct count for that
+word. In other words, any new output for a word, invalidated all the
+previous counts for that word.
+
+### Inputs
+
+As seen from the example above, the following inputs are expected for
+the operator:
+
+1.  Input stream whose tuple type is String
+2.  Input HDFS file path, pointing to a file containing stop-words
+
+Only one input port is needed. The stop-word file will be small enough
+to be read completely in a single read. In addition this will be a one
+time activity for the lifetime of the operator. This does not need a
+separate input port.
+
+![](images/operator/image03.png)
+
+### Outputs
+
+We can define the output for this operator in multiple ways.
+
+1.  The operator may send out the set of counts for which the counts
+    have changed after processing each tuple.
+2.  Some applications might not need an update after every tuple, but
+    only after a certain time duration.
+
+Let us try and implement both these options depending on the
+configuration. Let us define a boolean configuration parameter
+_“sendPerTuple”_. The value of this parameter will indicate whether the
+updated counts for words need to be emitted after processing each
+tuple (true) or after a certain time duration (false).
+
+The type of information the operator will be sending out on the output
+port is the same for all the cases. This will be a _< key, value >_ pair,
+where the key is the word while, the value is the latest count for that
+word. This means we just need one output port on which this information
+will go out.
+
+![](images/operator/image04.png)
+
+Configuration
+-------------
+
+We have the following configuration parameters:
+
+1.  _stopWordFilePath_ - This parameter will store the path to the stop
+    word file on HDFS as configured by the user.
+2.  _sendPerTuple_ - This parameter decides whether we send out the
+    updated counts after processing each tuple or at the end of a
+    window. When set to true, the operator will send out the updated
+    counts after each tuple, else it will send at the end of
+    each window.
+
+Code
+----
+
+The source code for the tutorial can be found here:
+
+[https://github.com/DataTorrent/examples/tree/master/tutorials/operatorTutorial](https://www.google.com/url?q=https://github.com/DataTorrent/examples/tree/master/tutorials/operatorTutorial&sa=D&usg=AFQjCNHAAgSpNprHJVvy9GSjdlD1uwU7jw)
+
+
+Operator Reference <a name="operator_reference"></a>
+====================================
+
+
+### The Operator Class
+
+The operator will exist physically as a class which implements the
+Operator interface. This interface will require implementations for the
+following method calls:
+
+-  setup(OperatorContext context)
+-  beginWindow(long windowId)
+-  endWindow()
+-  tearDown()
+
+In order to simplify the creation of an operator, Apache Apex
+library also provides a base class “BaseOperator” which has empty
+implementations for these methods. Please refer to the [Apex Operators](#apex_operators) section and the
+[Reference](#operator_reference) section for details on these.
+
+We extend the class “BaseOperator” to create our own operator
+“WordCountOperator”.
+
+``` java
+public class WordCountOperator extends BaseOperator
+{
+}
+```
+
+### Class (Operator) properties
+
+We define the following class variables:
+
+-   _sendPerTuple_ - Configures the output frequency from the operator
+``` java
+private boolean sendPerTuple = true; // default
+```
+-   _stopWordFilePath_ - Stores the path to the stop words file on HDFS
+``` java
+private String stopWordFilePath; // no default
+```
+-   _stopWords_ - Stores the stop words read from the configured file
+``` java
+private transient String[] stopWords;
+```
+-   _globalCounts_ - A Map which stores the counts of all the words
+    encountered so far. Note that this variable is non transient, which
+    means that this variable is saved as part of the checkpoint and can be recovered in event of a crash.
+``` java
+private Map<String, Long> globalCounts;
+```
+-   _updatedCounts_ - A Map which stores the counts for only the most
+    recent tuple(s). sendPerTuple configuration determines whether to store the most recent or the recent
+    window worth of tuples.
+``` java
+private transient Map<String, Long> updatedCounts;
+```
+-   _input_ - The input port for the operator. The type of this input port
+    is String which means it will only accept tuples of type String. The
+    definition of an input port requires implementation of a method
+    called process(String tuple), which should have the processing logic
+    for the input tuple which  arrives at this input port. We delegate
+    this task to another method called processTuple(String tuple). This
+    helps in keeping the operator classes extensible by overriding the
+    processing logic for the input tuples.
+``` java
+public transient DefaultInputPort<String> input = new    
+DefaultInputPort<String>()
+{
+    @Override
+    public void process(String tuple)
+    {
+        processTuple(tuple);
+    }
+};
+```
+-   output - The output port for the operator. The type of this port is
+    Entry < String, Long >, which means the operator will emit < word,
+    count > pairs for the updated counts.
+``` java
+public transient DefaultOutputPort <Entry<String, Long>> output = new
+DefaultOutputPort<Entry<String,Long>>();
+```
+
+### The Constructor
+
+The constructor is the place where we initialize the non-transient data
+structures, since
+constructor is called just once per activation of an operator. With regards to Word Count operator, we initialize the globalCounts variable in the constructor.
+``` java
+globalCounts = Maps.newHashMap();
+```
+### Setup call
+
+The setup method is called only once during an operator lifetime and its purpose is to allow 
+the operator to set itself up for processing incoming streams. Transient objects in the operator are
+not serialized and checkpointed. Hence, it is essential that such objects initialized in the setup call. 
+In case of operator failure, the operator will be redeployed (most likely on a different container). The setup method called by the Apache Apex engine allows the operator to prepare for execution in the new container.
+
+The following tasks are executed as part of the setup call:
+
+1.  Read the stop-word list from HDFS and store it in the
+    stopWords array
+2.  Initialize updatedCounts variable. This will store the updated
+    counts for words in most recent tuples processed by the operator.
+    As a transient variable, the value will be lost when operator fails.
+
+### Begin Window call
+
+The begin window call signals the start of an application window. With 
+regards to Word Count Operator, we are expecting updated counts for the most recent window of
+data if the sendPerTuple is set to false. Hence, we clear the updatedCounts variable in the begin window
+call and start accumulating the counts till the end window call.
+
+### Process Tuple call
+
+The processTuple method is called by the process method of the input
+port, input. This method defines the processing logic for the current
+tuple that is received at the input port. As part of this method, we
+identify the words in the current tuple and update the globalCounts and
+the updatedCounts variables. In addition, if the sendPerTuple variable
+is set to true, we also emit the words and corresponding counts in
+updatedCounts to the output port. Note that in this case (sendPerTuple =
+true), we clear the updatedCounts variable in every call to
+processTuple.
+
+### End Window call
+
+This call signals the end of an application window. With regards to Word
+Count Operator, we emit the updatedCounts to the output port if the
+sendPerTuple flag is set to false.
+
+### Teardown call
+
+This method allows the operator to gracefully shut down itself after
+releasing the resources that it has acquired. With regards to our operator,
+we call the shutDown method which shuts down the operator along with any
+downstream operators.
+
+Testing your Operator
+---------------------
+
+As part of testing our operator, we test the following two facets:
+
+1.  Test output of the operator after processing a single tuple
+2.  Test output of the operator after processing of a window of tuples
+
+The unit tests for the WordCount operator are available in the class
+WordCountOperatorTest.java. We simulate the behavior of the engine by
+using the test utilities provided by Apache Apex libraries. We simulate
+the setup, beginWindow, process method of the input port and
+endWindow calls and compare the output received at the simulated output
+ports.
+
+1. Invoke constructor; non-transients initialized.
+2. Copy state from checkpoint -- initialized values from step 1 are
+replaced.



Mime
View raw message