lucene-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ctarg...@apache.org
Subject [08/37] lucene-solr:branch_6x: squash merge jira/solr-10290 into master
Date Fri, 12 May 2017 14:05:16 GMT
http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/ccbc93b8/solr/solr-ref-guide/src/streaming-expressions.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/streaming-expressions.adoc b/solr/solr-ref-guide/src/streaming-expressions.adoc
new file mode 100644
index 0000000..665fbf9
--- /dev/null
+++ b/solr/solr-ref-guide/src/streaming-expressions.adoc
@@ -0,0 +1,1971 @@
+= Streaming Expressions
+:page-shortname: streaming-expressions
+:page-permalink: streaming-expressions.html
+:page-children: graph-traversal
+
+Streaming Expressions provide a simple yet powerful stream processing language for Solr Cloud.
+
+Streaming expressions are a suite of functions that can be combined to perform many different parallel computing tasks. These functions are the basis for the <<parallel-sql-interface.adoc#parallel-sql-interface,Parallel SQL Interface>>.
+
+There is a growing library of functions that can be combined to implement:
+
+* Request/response stream processing
+* Batch stream processing
+* Fast interactive MapReduce
+* Aggregations (Both pushed down faceted and shuffling MapReduce)
+* Parallel relational algebra (distributed joins, intersections, unions, complements)
+* Publish/subscribe messaging
+* Distributed graph traversal
+* Machine learning and parallel iterative model training
+* Anomaly detection
+* Recommendation systems
+* Retrieve and rank services
+* Text classification and feature extraction
+* Streaming NLP
+
+Streams from outside systems can be joined with streams originating from Solr and users can add their own stream functions by following Solr's {solr-javadocs}/solr-solrj/org/apache/solr/client/solrj/io/stream/package-summary.html[Java streaming API].
+
+[IMPORTANT]
+====
+Both streaming expressions and the streaming API are considered experimental, and the APIs are subject to change.
+====
+
+[[StreamingExpressions-StreamLanguageBasics]]
+== Stream Language Basics
+
+Streaming Expressions are comprised of streaming functions which work with a Solr collection. They emit a stream of tuples (key/value Maps).
+
+Many of the provided streaming functions are designed to work with entire result sets rather then the top N results like normal search. This is supported by the <<exporting-result-sets.adoc#exporting-result-sets,/export handler>>.
+
+Some streaming functions act as stream sources to originate the stream flow. Other streaming functions act as stream decorators to wrap other stream functions and perform operations on the stream of tuples. Many streams functions can be parallelized across a worker collection. This can be particularly powerful for relational algebra functions.
+
+[[StreamingExpressions-StreamingRequestsandResponses]]
+=== Streaming Requests and Responses
+
+Solr has a `/stream` request handler that takes streaming expression requests and returns the tuples as a JSON stream. This request handler is implicitly defined, meaning there is nothing that has to be defined in `solrconfig.xml` - see <<implicit-requesthandlers.adoc#implicit-requesthandlers,Implicit RequestHandlers>>.
+
+The `/stream` request handler takes one parameter, `expr`, which is used to specify the streaming expression. For example, this curl command encodes and POSTs a simple `search()` expression to the `/stream` handler:
+
+[source,bash]
+----
+curl --data-urlencode 'expr=search(enron_emails,
+                                   q="from:1800flowers*",
+                                   fl="from, to",
+                                   sort="from asc",
+                                   qt="/export")' http://localhost:8983/solr/enron_emails/stream
+----
+
+Details of the parameters for each function are included below.
+
+For the above example the `/stream` handler responded with the following JSON response:
+
+[source,json]
+----
+{"result-set":{"docs":[
+   {"from":"1800flowers.133139412@s2u2.com","to":"lcampbel@enron.com"},
+   {"from":"1800flowers.93690065@s2u2.com","to":"jtholt@ect.enron.com"},
+   {"from":"1800flowers.96749439@s2u2.com","to":"alewis@enron.com"},
+   {"from":"1800flowers@1800flowers.flonetwork.com","to":"lcampbel@enron.com"},
+   {"from":"1800flowers@1800flowers.flonetwork.com","to":"lcampbel@enron.com"},
+   {"from":"1800flowers@1800flowers.flonetwork.com","to":"lcampbel@enron.com"},
+   {"from":"1800flowers@1800flowers.flonetwork.com","to":"lcampbel@enron.com"},
+   {"from":"1800flowers@1800flowers.flonetwork.com","to":"lcampbel@enron.com"},
+   {"from":"1800flowers@shop2u.com","to":"ebass@enron.com"},
+   {"from":"1800flowers@shop2u.com","to":"lcampbel@enron.com"},
+   {"from":"1800flowers@shop2u.com","to":"lcampbel@enron.com"},
+   {"from":"1800flowers@shop2u.com","to":"lcampbel@enron.com"},
+   {"from":"1800flowers@shop2u.com","to":"ebass@enron.com"},
+   {"from":"1800flowers@shop2u.com","to":"ebass@enron.com"},
+   {"EOF":true,"RESPONSE_TIME":33}]}
+}
+----
+
+Note the last tuple in the above example stream is `{"EOF":true,"RESPONSE_TIME":33}`. The `EOF` indicates the end of the stream. To process the JSON response, you'll need to use a streaming JSON implementation because streaming expressions are designed to return the entire result set which may have millions of records. In your JSON client you'll need to iterate each doc (tuple) and check for the EOF tuple to determine the end of stream.
+
+The {solr-javadocs}/solr-solrj/org/apache/solr/client/solrj/io/package-summary.html[`org.apache.solr.client.solrj.io`] package provides Java classes that compile streaming expressions into streaming API objects. These classes can be used to execute streaming expressions from inside a Java application. For example:
+
+[source,java]
+----
+StreamFactory streamFactory = new StreamFactory().withCollectionZkHost("collection1", zkServer.getZkAddress())
+    .withStreamFunction("search", CloudSolrStream.class)
+    .withStreamFunction("unique", UniqueStream.class)
+    .withStreamFunction("top", RankStream.class)
+    .withStreamFunction("group", ReducerStream.class)
+    .withStreamFunction("parallel", ParallelStream.class);
+
+ParallelStream pstream = (ParallelStream)streamFactory.constructStream("parallel(collection1, group(search(collection1, q=\"*:*\", fl=\"id,a_s,a_i,a_f\", sort=\"a_s asc,a_f asc\", partitionKeys=\"a_s\"), by=\"a_s asc\"), workers=\"2\", zkHost=\""+zkHost+"\", sort=\"a_s asc\")");
+----
+
+[[StreamingExpressions-DataRequirements]]
+=== Data Requirements
+
+Because streaming expressions relies on the `/export` handler, many of the field and field type requirements to use `/export` are also requirements for `/stream`, particularly for `sort` and `fl` parameters. Please see the section <<exporting-result-sets.adoc#exporting-result-sets,Exporting Result Sets>> for details.
+
+[[StreamingExpressions-StreamSources]]
+== Stream Sources
+
+Stream sources originate streams.
+
+[[StreamingExpressions-echo]]
+=== echo
+//TODO
+
+[[StreamingExpressions-search]]
+=== search
+
+The `search` function searches a SolrCloud collection and emits a stream of tuples that match the query. This is very similar to a standard Solr query, and uses many of the same parameters.
+
+This expression allows you to specify a request hander using the `qt` parameter. By default, the `/select` handler is used. The `/select` handler can be used for simple rapid prototyping of expressions. For production, however, you will most likely want to use the `/export` handler which is designed to `sort` and `export` entire result sets. The `/export` handler is not used by default because it has stricter requirements then the `/select` handler so it's not as easy to get started working with. To read more about the `/export` handler requirements review the section <<exporting-result-sets.adoc#exporting-result-sets,Exporting Result Sets>>.
+
+[[StreamingExpressions-Parameters]]
+==== Parameters
+
+* `collection`: (Mandatory) the collection being searched.
+* `q`: (Mandatory) The query to perform on the Solr index.
+* `fl`: (Mandatory) The list of fields to return.
+* `sort`: (Mandatory) The sort criteria.
+* `zkHost`: Only needs to be defined if the collection being searched is found in a different zkHost than the local stream handler.
+* `qt`: Specifies the query type, or request handler, to use. Set this to `/export` to work with large result sets. The default is `/select`.
+* `rows`: (Mandatory with the `/select` handler) The rows parameter specifies how many rows to return. This parameter is only needed with the `/select` handler (which is the default) since the `/export` handler always returns all rows.
+* `partitionKeys`: Comma delimited list of keys to partition the search results by. To be used with the parallel function for parallelizing operations across worker nodes. See the <<StreamingExpressions-parallel,parallel>> function for details.
+
+[[StreamingExpressions-Syntax]]
+==== Syntax
+
+[source,text]
+----
+expr=search(collection1,
+       zkHost="localhost:9983",
+       qt="/export",
+       q="*:*",
+       fl="id,a_s,a_i,a_f",
+       sort="a_f asc, a_i asc")
+----
+
+
+=== shuffle
+//TODO
+
+[[StreamingExpressions-jdbc]]
+=== jdbc
+
+The `jdbc` function searches a JDBC datasource and emits a stream of tuples representing the JDBC result set. Each row in the result set is translated into a tuple and each tuple contains all the cell values for that row.
+
+[[StreamingExpressions-Parameters.1]]
+==== Parameters
+
+* `connection`: (Mandatory) JDBC formatted connection string to whatever driver you are using.
+* `sql`: (Mandatory) query to pass off to the JDBC endpoint
+* `sort`: (Mandatory) The sort criteria indicating how the data coming out of the JDBC stream is sorted
+* `driver`: The name of the JDBC driver used for the connection. If provided then the driver class will attempt to be loaded into the JVM. If not provided then it is assumed that the driver is already loaded into the JVM. Some drivers require explicit loading so this option is provided.
+* `[driverProperty]`: One or more properties to pass to the JDBC driver during connection. The format is `propertyName="propertyValue"`. You can provide as many of these properties as you'd like and they will all be passed to the connection.
+
+[[StreamingExpressions-ConnectionsandDrivers]]
+==== Connections and Drivers
+
+Because some JDBC drivers require explicit loading the `driver` parameter can be used to provide the driver class name. If provided, then during stream construction the driver will be loaded. If the driver cannot be loaded because the class is not found on the classpath, then stream construction will fail.
+
+When the JDBC stream is opened it will validate that a driver can be found for the provided connection string. If a driver cannot be found (because it hasn't been loaded) then the open will fail.
+
+[[StreamingExpressions-Datatypes]]
+==== Datatypes
+
+Due to the inherent differences in datatypes across JDBC sources the following datatypes are supported. The table indicates what Java type will be used for a given JDBC type. Types marked as requiring conversion will go through a conversion for each value of that type. For performance reasons the cell data types are only considered when the stream is opened as this is when the converters are created.
+
+[width="100%",options="header",]
+|===
+|JDBC Type |Java Type |Requires Conversion
+|String |String |No
+|Short |Long |Yes
+|Integer |Long |Yes
+|Long |Long |No
+|Float |Double |Yes
+|Double |Double |No
+|Boolean |Boolean |No
+|===
+
+[[StreamingExpressions-Syntax.1]]
+==== Syntax
+
+A basic `jdbc` expression:
+
+[source,text]
+----
+jdbc(
+    connection="jdbc:hsqldb:mem:.",
+    sql="select NAME, ADDRESS, EMAIL, AGE from PEOPLE where AGE > 25 order by AGE, NAME DESC",
+    sort="AGE asc, NAME desc",
+    driver="org.hsqldb.jdbcDriver"
+)
+----
+
+A `jdbc` expression that passes a property to the driver:
+
+[source,text]
+----
+// get_column_name is a property to pass to the hsqldb driver
+jdbc(
+    connection="jdbc:hsqldb:mem:.",
+    sql="select NAME as FIRST_NAME, ADDRESS, EMAIL, AGE from PEOPLE where AGE > 25 order by AGE, NAME DESC",
+    sort="AGE asc, NAME desc",
+    driver="org.hsqldb.jdbcDriver",
+    get_column_name="false"
+)
+----
+
+[[StreamingExpressions-facet]]
+=== facet
+
+The `facet` function provides aggregations that are rolled up over buckets. Under the covers the facet function pushes down the aggregation into the search engine using Solr's JSON Facet API. This provides sub-second performance for many use cases. The facet function is appropriate for use with a low to moderate number of distinct values in the bucket fields. To support high cardinality aggregations see the rollup function.
+
+[[StreamingExpressions-Parameters.2]]
+==== Parameters
+
+* `collection`: (Mandatory) Collection the facets will be aggregated from.
+* `q`: (Mandatory) The query to build the aggregations from.
+* `buckets`: (Mandatory) Comma separated list of fields to rollup over. The comma separated list represents the dimensions in a multi-dimensional rollup.
+* `bucketSorts`: Comma separated list of sorts to apply to each dimension in the buckets parameters. Sorts can be on the computed metrics or on the bucket values.
+* `bucketSizeLimit`: The number of buckets to include. This value is applied to each dimension.
+* `metrics`: List of metrics to compute for the buckets. Currently supported metrics are `sum(col)`, `avg(col)`, `min(col)`, `max(col)`, `count(*)`.
+
+[[StreamingExpressions-Syntax.2]]
+==== Syntax
+
+Example 1:
+
+[source,text]
+----
+facet(collection1,
+      q="*:*",
+      buckets="a_s",
+      bucketSorts="sum(a_i) desc",
+      bucketSizeLimit=100,
+      sum(a_i),
+      sum(a_f),
+      min(a_i),
+      min(a_f),
+      max(a_i),
+      max(a_f),
+      avg(a_i),
+      avg(a_f),
+      count(*))
+----
+
+The example above shows a facet function with rollups over a single bucket, where the buckets are returned in descending order by the calculated value of the `sum(a_i)` metric.
+
+Example 2:
+
+[source,text]
+----
+facet(collection1,
+      q="*:*",
+      buckets="year_i, month_i, day_i",
+      bucketSorts="year_i desc, month_i desc, day_i desc",
+      bucketSizeLimit=100,
+      sum(a_i),
+      sum(a_f),
+      min(a_i),
+      min(a_f),
+      max(a_i),
+      max(a_f),
+      avg(a_i),
+      avg(a_f),
+      count(*))
+----
+
+The example above shows a facet function with rollups over three buckets, where the buckets are returned in descending order by bucket value.
+
+[[StreamingExpressions-features]]
+=== features
+
+The `features` function extracts the key terms from a text field in a classification training set stored in a SolrCloud collection. It uses an algorithm known as * Information Gain* , to select the important terms from the training set. The `features` function was designed to work specifically with the <<StreamingExpressions-train,train>> function, which uses the extracted features to train a text classifier.
+
+The `features` function is designed to work with a training set that provides both positive and negative examples of a class. It emits a tuple for each feature term that is extracted along with the inverse document frequency (IDF) for the term in the training set.
+
+The `features` function uses a query to select the training set from a collection. The IDF for each selected feature is calculated relative to the training set matching the query. This allows multiple training sets to be stored in the same SolrCloud collection without polluting the IDF across training sets.
+
+[[StreamingExpressions-Parameters.3]]
+==== Parameters
+
+* `collection`: (Mandatory) The collection that holds the training set
+* `q`: (Mandatory) The query that defines the training set. The IDF for the features will be generated specific to the result set matching the query.
+* `featureSet`: (Mandatory) The name of the feature set. This can be used to retrieve the features if they are stored in a SolrCloud collection.
+* `field`: (Mandatory) The text field to extract the features from.
+* `outcome`: (Mandatory) The field that defines the class, positive or negative
+* `numTerms`: (Mandatory) How many feature terms to extract.
+* `positiveLabel`: (defaults to 1) The value in the outcome field that defines a postive outcome.
+
+[[StreamingExpressions-Syntax.3]]
+==== Syntax
+
+[source,text]
+----
+features(collection1,
+         q="*:*",
+         featureSet="features1",
+         field="body",
+         outcome="out_i",
+         numTerms=250)
+----
+
+[[StreamingExpressions-gatherNodes]]
+=== gatherNodes
+
+The `gatherNodes` function provides breadth-first graph traversal. For details, see the section <<graph-traversal.adoc#graph-traversal,Graph Traversal>>.
+
+[[StreamingExpressions-model]]
+=== model
+
+The `model` function retrieves and caches logistic regression text classification models that are stored in a SolrCloud collection. The `model` function is designed to work with models that are created by the <<StreamingExpressions-train,train function>>, but can also be used to retrieve text classification models trained outside of Solr, as long as they conform to the specified format. After the model is retrieved it can be used by the <<StreamingExpressions-classify,classify function>> to classify documents.
+
+A single model tuple is fetched and returned based on the *id* parameter. The model is retrieved by matching the *id* parameter with a model name in the index. If more then one iteration of the named model is stored in the index, the highest iteration is selected.
+
+[[StreamingExpressions-Caching]]
+==== Caching
+
+The `model` function has an internal LRU (least-recently-used) cache so models do not have to be retrieved with each invocation of the `model` function. The time to cache for each model ID can be passed as a parameter to the function call. Retrieving a cached model does not reset the time for expiring the model ID in the cache.
+
+[[StreamingExpressions-ModelStorage]]
+==== Model Storage
+
+The storage format of the models in Solr is below. The `train` function outputs the format below so you only need to know schema details if you plan to use the `model` function with logistic regression models trained outside of Solr.
+
+* `name_s` (Single value, String, Stored): The name of the model.
+* `iteration_i` (Single value, Integer, Stored): The iteration number of the model. Solr can store all iterations of the models generated by the train function.
+* `terms_ss` (Multi value, String, Stored: The array of terms/features of the model.
+* `weights_ds` (Multi value, double, Stored): The array of term weights. Each weight corresponds by array index to a term.
+* `idfs_ds` (Multi value, double, Stored): The array of term IDFs (Inverse document frequency). Each IDF corresponds by array index to a term.
+
+[[StreamingExpressions-Parameters.4]]
+==== Parameters
+
+* `collection`: (Mandatory) The collection where the model is stored.
+* `id`: (Mandatory) The id/name of the model. The model function always returns one model. If there are multiple iterations of the name, the highest iteration is returned.
+* `cacheMillis`: (Optional) The amount of time to cache the model in the LRU cache.
+
+[[StreamingExpressions-Syntax.4]]
+==== Syntax
+
+[source,text]
+----
+model(modelCollection,
+      id="myModel"
+      cacheMillis="200000")
+----
+
+[[StreamingExpressions-random]]
+=== random
+
+The `random` function searches a SolrCloud collection and emits a pseudo-random set of results that match the query. Each invocation of random will return a different pseudo-random result set.
+
+[[StreamingExpressions-Parameters.5]]
+==== Parameters
+
+* `collection`: (Mandatory) The collection the stats will be aggregated from.
+* `q`: (Mandatory) The query to build the aggregations from.
+* `rows`: (Mandatory) The number of pseudo-random results to return.
+* fl: (Mandatory) The field list to return.
+* `fq`: (Optional) Filter query
+
+[[StreamingExpressions-Syntax.5]]
+==== Syntax
+
+[source,text]
+----
+random(baskets,
+       q="productID:productX",
+       rows="100",
+       fl="basketID")
+----
+
+In the example above the `random` function is searching the baskets collections for all rows where "productID:productX". It will return 100 pseudo-random results. The field list returned is the basketID.
+
+[[StreamingExpressions-significantTerms]]
+=== significantTerms
+
+The `significantTerms` function queries a SolrCloud collection, but instead of returning documents, it returns significant terms found in documents in the result set. The `significantTerms` function scores terms based on how frequently they appear in the result set and how rarely they appear in the entire corpus. The `significantTerms` function emits a tuple for each term which contains the term, the score, the foreground count and the background count. The foreground count is how many documents the term appears in in the result set. The background count is how many documents the term appears in in the entire corpus. The foreground and background counts are global for the collection.
+
+[[StreamingExpressions-Parameters.6]]
+==== Parameters
+
+* `collection`: (Mandatory) The collection that the function is run on.
+* `q`: (Mandatory) The query that describes the foreground document set.
+* `limit`: (Optional, Default 20) The max number of terms to return.
+* `minDocFreq`: (Optional, Defaults to 5 documents) The minimum number of documents the term must appear in on a shard. This is a float value. If greater then 1.0 then it's considered the absolute number of documents. If less then 1.0 it's treated as a percentage of documents.
+* `maxDocFreq`: (Optional, Defaults to 30% of documents) The maximum number of documents the term can appear in on a shard. This is a float value. If greater then 1.0 then it's considered the absolute number of documents. If less then 1.0 it's treated as a percentage of documents.
+* `minTermLength`: (Optional, Default 4) The minimum length of the term to be considered significant.
+
+[[StreamingExpressions-Syntax.6]]
+==== Syntax
+
+[source,text]
+----
+significantTerms(collection1,
+                 q="body:Solr",
+                 minDocFreq="10",
+                 maxDocFreq=".20",
+                 minTermLength="5")
+----
+
+In the example above the `significantTerms` function is querying `collection1` and returning at most 50 significant terms that appear in 10 or more documents but not more then 20% of the corpus.
+
+[[StreamingExpressions-shortestPath]]
+=== shortestPath
+
+The `shortestPath` function is an implementation of a shortest path graph traversal. The `shortestPath` function performs an iterative breadth-first search through an unweighted graph to find the shortest paths between two nodes in a graph. The `shortestPath` function emits a tuple for each path found. Each tuple emitted will contain a `path` key which points to a `List` of nodeIDs comprising the path.
+
+[[StreamingExpressions-Parameters.7]]
+==== Parameters
+
+* `collection`: (Mandatory) The collection that the topic query will be run on.
+* `from`: (Mandatory) The nodeID to start the search from
+* `to`: (Mandatory) The nodeID to end the search at
+* `edge`: (Mandatory) Syntax: `from_field=to_field`. The `from_field` defines which field to search from. The `to_field` defines which field to search to. See example below for a detailed explanation.
+* `threads`: (Optional : Default 6) The number of threads used to perform the partitioned join in the traversal.
+* `partitionSize`: (Optional : Default 250) The number of nodes in each partition of the join.
+* `fq`: (Optional) Filter query
+* `maxDepth`: (Mandatory) Limits to the search to a maximum depth in the graph.
+
+[[StreamingExpressions-Syntax.7]]
+==== Syntax
+
+[source,text]
+----
+shortestPath(collection,
+             from="john@company.com",
+             to="jane@company.com",
+             edge="from_address=to_address",
+             threads="6",
+             partitionSize="300",
+             fq="limiting query",
+             maxDepth="4")
+----
+
+The expression above performs a breadth-first search to find the shortest paths in an unweighted, directed graph.
+
+The search starts from the nodeID "\john@company.com" in the `from_address` field and searches for the nodeID "\jane@company.com" in the `to_address` field. This search is performed iteratively until the `maxDepth` has been reached. Each level in the traversal is implemented as a parallel partitioned nested loop join across the entire collection. The `threads` parameter controls the number of threads performing the join at each level, while the `partitionSize` parameter controls the of number of nodes in each join partition. The `maxDepth` parameter controls the number of levels to traverse. `fq` is a limiting query applied to each level in the traversal.
+
+[[StreamingExpressions-stats]]
+=== stats
+
+The `stats` function gathers simple aggregations for a search result set. The stats function does not support rollups over buckets, so the stats stream always returns a single tuple with the rolled up stats. Under the covers the stats function pushes down the generation of the stats into the search engine using the StatsComponent. The stats function currently supports the following metrics: `count(*)`, `sum()`, `avg()`, `min()`, and `max()`.
+
+[[StreamingExpressions-Parameters.8]]
+==== Parameters
+
+* `collection`: (Mandatory) Collection the stats will be aggregated from.
+* `q`: (Mandatory) The query to build the aggregations from.
+* `metrics`: (Mandatory) The metrics to include in the result tuple. Current supported metrics are `sum(col)`, `avg(col)`, `min(col)`, `max(col)` and `count(*)`
+
+[[StreamingExpressions-Syntax.8]]
+==== Syntax
+
+[source,text]
+----
+stats(collection1,
+      q=*:*,
+      sum(a_i),
+      sum(a_f),
+      min(a_i),
+      min(a_f),
+      max(a_i),
+      max(a_f),
+      avg(a_i),
+      avg(a_f),
+      count(*))
+----
+
+[[StreamingExpressions-timeseries]]
+=== timeseries
+
+//TODO
+
+[[StreamingExpressions-train]]
+=== train
+
+The `train` function trains a Logistic Regression text classifier on a training set stored in a SolrCloud collection. It uses a parallel iterative, batch Gradient Descent approach to train the model. The training algorithm is embedded inside Solr so with each iteration only the model is streamed across the network.
+
+The `train` function wraps a <<StreamingExpressions-features,features>> function which provides the terms and inverse document frequency (IDF) used to train the model. The `train` function operates over the same training set as the `features` function, which includes both positive and negative examples of the class.
+
+With each iteration the `train` function emits a tuple with the model. The model contains the feature terms, weights, and the confusion matrix for the model. The optimized model can then be used to classify documents based on their feature terms.
+
+[[StreamingExpressions-Parameters.9]]
+==== Parameters
+
+* `collection`: (Mandatory) Collection that holds the training set
+* `q`: (Mandatory) The query that defines the training set. The IDF for the features will be generated on the
+* `name`: (Mandatory) The name of model. This can be used to retrieve the model if they stored in a Solr Cloud collection.
+* `field`: (Mandatory) The text field to extract the features from.
+* `outcome`: (Mandatory) The field that defines the class, positive or negative
+* `maxIterations`: (Mandatory) How many training iterations to perform.
+* `positiveLabel`: (defaults to 1) The value in the outcome field that defines a positive outcome.
+
+[[StreamingExpressions-Syntax.9]]
+==== Syntax
+
+[source,text]
+----
+train(collection1,
+      features(collection1, q="*:*", featureSet="first", field="body", outcome="out_i", numTerms=250),
+      q="*:*",
+      name="model1",
+      field="body",
+      outcome="out_i",
+      maxIterations=100)
+----
+
+[[StreamingExpressions-topic]]
+=== topic
+
+The `topic` function provides publish/subscribe messaging capabilities built on top of SolrCloud. The topic function allows users to subscribe to a query. The function then provides one-time delivery of new or updated documents that match the topic query. The initial call to the topic function establishes the checkpoints for the specific topic ID. Subsequent calls to the same topic ID will return documents added or updated after the initial checkpoint. Each run of the topic query updates the checkpoints for the topic ID. Setting the initialCheckpoint parameter to 0 will cause the topic to process all documents in the index that match the topic query.
+
+[WARNING]
+====
+The topic function should be considered in beta until https://issues.apache.org/jira/browse/SOLR-8709[SOLR-8709] is committed and released.
+====
+
+[[StreamingExpressions-Parameters.10]]
+==== Parameters
+
+* `checkpointCollection`: (Mandatory) The collection where the topic checkpoints are stored.
+* `collection`: (Mandatory) The collection that the topic query will be run on.
+* `id`: (Mandatory) The unique ID for the topic. The checkpoints will be saved under this id.
+* `q`: (Mandatory) The topic query.
+* `fl`: (Mandatory) The field list returned by the topic function.
+* `initialCheckpoint`: (Optional) Sets the initial Solr `\_version_` number to start reading from the queue. If not set, it defaults to the highest version in the index. Setting to 0 will process all records that match query in the index.
+
+[[StreamingExpressions-Syntax.10]]
+==== Syntax
+
+[source,text]
+----
+topic(checkpointCollection,
+      collection,
+      id="uniqueId",
+      q="topic query",
+      fl="id, name, country")
+----
+
+[[StreamingExpressions-StreamDecorators]]
+== Stream Decorators
+
+Stream decorators wrap other stream functions or perform operations on the stream.
+
+
+=== cartesianProduct
+//TODO
+
+=== cell
+//TODO
+
+[[StreamingExpressions-classify]]
+=== classify
+
+The `classify` function classifies tuples using a logistic regression text classification model. It was designed specifically to work with models trained using the <<StreamingExpressions-train,train function>>. The `classify` function uses the <<StreamingExpressions-model,model function>> to retrieve a stored model and then scores a stream of tuples using the model. The tuples read by the classifier must contain a text field that can be used for classification. The classify function uses a Lucene analyzer to extract the features from the text so the model can be applied. By default the `classify` function looks for the analyzer using the name of text field in the tuple. If the Solr schema on the worker node does not contain this field, the analyzer can be looked up in another field by specifying the `analyzerField` parameter.
+
+Each tuple that is classified is assigned two scores:
+
+* probability_d* : A float between 0 and 1 which describes the probability that the tuple belongs to the class. This is useful in the classification use case.
+
+* score_d* : The score of the document that has not be squashed between 0 and 1. The score may be positive or negative. The higher the score the better the document fits the class. This un-squashed score will be useful in query re-ranking and recommendation use cases. This score is particularly useful when multiple high ranking documents have a probability_d score of 1, which won't provide a meaningful ranking between documents.
+
+[[StreamingExpressions-Parameters.11]]
+==== Parameters
+
+* `model expression`: (Mandatory) Retrieves the stored logistic regression model.
+* `field`: (Mandatory) The field in the tuples to apply the classifier to. By default the analyzer for this field in the schema will be used extract the features.
+* `analyzerField`: (Optional) Specifies a different field to find the analyzer from in the schema.
+
+[[StreamingExpressions-Syntax.11]]
+==== Syntax
+
+[source,text]
+----
+classify(model(modelCollection,
+             id="model1",
+             cacheMillis=5000),
+         search(contentCollection,
+             q="id:(a b c)",
+             fl="text_t, id",
+             sort="id asc"),
+             field="text_t")
+----
+
+In the example above the `classify expression` is retrieving the model using the `model` function. It is then classifying tuples returned by the `search` function. The `text_t` field is used for the text classification and the analyzer for the `text_t` field in the Solr schema is used to analyze the text and extract the features.
+
+[[StreamingExpressions-commit]]
+=== commit
+
+The `commit` function wraps a single stream (A) and given a collection and batch size will send commit messages to the collection when the batch size is fulfilled or the end of stream is reached. A commit stream is used most frequently with an update stream and as such the commit will take into account possible summary tuples coming from the update stream. All tuples coming into the commit stream will be returned out of the commit stream - no tuples will be dropped and no tuples will be added.
+
+[[StreamingExpressions-Parameters.12]]
+==== Parameters
+
+* `collection`: The collection to send commit messages to (required)
+* `batchSize`: The commit batch size, sends commit message when batch size is hit. If not provided (or provided as value 0) then a commit is only sent at the end of the incoming stream.
+* `waitFlush`: The value passed directly to the commit handler (true/false, default: false)
+* `waitSearcher`: The value passed directly to the commit handler (true/false, default: false)
+* `softCommit`: The value passed directly to the commit handler (true/false, default: false)
+* `StreamExpression for StreamA` (required)
+
+[[StreamingExpressions-Syntax.12]]
+==== Syntax
+
+[source,text]
+----
+commit(
+    destinationCollection,
+    batchSize=2,
+    update(
+        destinationCollection,
+        batchSize=5,
+        search(collection1, q=*:*, fl="id,a_s,a_i,a_f,s_multi,i_multi", sort="a_f asc, a_i asc")
+    )
+)
+----
+
+[[StreamingExpressions-complement]]
+=== complement
+
+The `complement` function wraps two streams (A and B) and emits tuples from A which do not exist in B. The tuples are emitted in the order in which they appear in stream A. Both streams must be sorted by the fields being used to determine equality (using the `on` parameter).
+
+[[StreamingExpressions-Parameters.13]]
+==== Parameters
+
+* `StreamExpression for StreamA`
+* `StreamExpression for StreamB`
+* `on`: Fields to be used for checking equality of tuples between A and B. Can be of the format `on="fieldName"`, `on="fieldNameInLeft=fieldNameInRight"`, or `on="fieldName, otherFieldName=rightOtherFieldName"`.
+
+[[StreamingExpressions-Syntax.13]]
+==== Syntax
+
+[source,text]
+----
+complement(
+  search(collection1, q=a_s:(setA || setAB), fl="id,a_s,a_i", sort="a_i asc, a_s asc"),
+  search(collection1, q=a_s:(setB || setAB), fl="id,a_s,a_i", sort="a_i asc"),
+  on="a_i"
+)
+
+complement(
+  search(collection1, q=a_s:(setA || setAB), fl="id,a_s,a_i", sort="a_i asc, a_s asc"),
+  search(collection1, q=a_s:(setB || setAB), fl="id,a_s,a_i", sort="a_i asc, a_s asc"),
+  on="a_i,a_s"
+)
+----
+
+[[StreamingExpressions-daemon]]
+=== daemon
+
+The `daemon` function wraps another function and runs it at intervals using an internal thread. The `daemon` function can be used to provide both continuous push and pull streaming.
+
+[[StreamingExpressions-Continuouspushstreaming]]
+==== Continuous Push Streaming
+
+With continuous push streaming the `daemon` function wraps another function and is then sent to the `/stream` handler for execution. The `/stream` handler recognizes the `daemon` function and keeps it resident in memory, so it can run its internal function at intervals.
+
+In order to facilitate the pushing of tuples, the `daemon` function must wrap another stream decorator that pushes the tuples somewhere. One example of this is the `update` function, which wraps a stream and sends the tuples to another SolrCloud collection for indexing.
+
+[[StreamingExpressions-Syntax.14]]
+==== Syntax
+
+[source,text]
+----
+daemon(id="uniqueId",
+       runInterval="1000",
+       terminate="true",
+       update(destinationCollection,
+              batchSize=100,
+              topic(checkpointCollection,
+                    topicCollection,
+                    q="topic query",
+                    fl="id, title, abstract, text",
+                    id="topicId",
+                    initialCheckpoint=0)
+               )
+        )
+----
+
+The sample code above shows a `daemon` function wrapping an `update` function, which is wrapping a `topic` function. When this expression is sent to the `/stream` handler, the `/stream` hander sees the `daemon` function and keeps it in memory where it will run at intervals. In this particular example, the `daemon` function will run the `update` function every second. The `update` function is wrapping a <<StreamingExpressions-topic,`topic` function>>, which will stream tuples that match the `topic` function query in batches. Each subsequent call to the topic will return the next batch of tuples for the topic. The `update` function will send all the tuples matching the topic to another collection to be indexed. The `terminate` parameter tells the daemon to terminate when the `topic` function stops sending tuples.
+
+The effect of this is to push documents that match a specific query into another collection. Custom push functions can be plugged in that push documents out of Solr and into other systems, such as Kafka or an email system.
+
+Push streaming can also be used for continuous background aggregation scenarios where aggregates are rolled up in the background at intervals and pushed to other Solr collections. Another use case is continuous background machine learning model optimization, where the optimized model is pushed to another Solr collection where it can be integrated into queries.
+
+The `/stream` handler supports a small set commands for listing and controlling daemon functions:
+
+[source,text]
+----
+http://localhost:8983/collection/stream?action=list
+----
+
+This command will provide a listing of the current daemon's running on the specific node along with there current state.
+
+[source,text]
+----
+http://localhost:8983/collection/stream?action=stop&id=daemonId
+----
+
+This command will stop a specific daemon function but leave it resident in memory.
+
+[source,text]
+----
+http://localhost:8983/collection/stream?action=start&id=daemonId
+----
+
+This command will start a specific daemon function that has been stopped.
+
+[source,text]
+----
+http://localhost:8983/collection/stream?action=kill&id=daemonId
+----
+
+This command will stop a specific daemon function and remove it from memory.
+
+[[StreamingExpressions-ContinousPullStreaming]]
+==== Continuous Pull Streaming
+
+The {solr-javadocs}/solr-solrj/org/apache/solr/client/solrj/io/stream/DaemonStream.html[DaemonStream] java class (part of the SolrJ libraries) can also be embedded in a java application to provide continuous pull streaming. Sample code:
+
+[source,java]
+----
+StreamContext context = new StreamContext()
+SolrClientCache cache = new SolrClientCache();
+context.setSolrClientCache(cache);
+
+Map topicQueryParams = new HashMap();
+topicQueryParams.put("q","hello");  // The query for the topic
+topicQueryparams.put("rows", "500"); // How many rows to fetch during each run
+topicQueryparams.put("fl", "id", "title"); // The field list to return with the documents
+
+TopicStream topicStream = new TopicStream(zkHost,        // Host address for the zookeeper service housing the collections
+                                         "checkpoints",  // The collection to store the topic checkpoints
+                                         "topicData",    // The collection to query for the topic records
+                                         "topicId",      // The id of the topic
+                                         -1,             // checkpoint every X tuples, if set -1 it will checkpoint after each run.
+                                          topicQueryParams); // The query parameters for the TopicStream
+
+DaemonStream daemonStream = new DaemonStream(topicStream,             // The underlying stream to run.
+                                             "daemonId",              // The id of the daemon
+                                             1000,                    // The interval at which to run the internal stream
+                                             500);                    // The internal queue size for the daemon stream. Tuples will be placed in the queue
+                                                                      // as they are read by the internal internal thread.
+                                                                      // Calling read() on the daemon stream reads records from the internal queue.
+
+daemonStream.setStreamContext(context);
+
+daemonStream.open();
+
+//Read until it's time to shutdown the DaemonStream. You can define the shutdown criteria.
+while(!shutdown()) {
+    Tuple tuple = daemonStream.read() // This will block until tuples become available from the underlying stream (TopicStream)
+                                      // The EOF tuple (signaling the end of the stream) will never occur until the DaemonStream has been shutdown.
+    //Do something with the tuples
+}
+
+// Shutdown the DaemonStream.
+daemonStream.shutdown();
+
+//Read the DaemonStream until the EOF Tuple is found.
+//This allows the underlying stream to perform an orderly shutdown.
+
+while(true) {
+    Tuple tuple = daemonStream.read();
+    if(tuple.EOF) {
+        break;
+    } else {
+        //Do something with the tuples.
+    }
+}
+//Finally close the stream
+daemonStream.close();
+----
+
+[[StreamingExpressions-eval]]
+=== eval
+
+//todo
+
+[[StreamingExpressions-executor]]
+=== executor
+
+The `executor` function wraps a stream source that contains streaming expressions, and executes the expressions in parallel. The `executor` function looks for the expression in the `expr_s` field in each tuple. The `executor` function has an internal thread pool that runs tasks that compile and run expressions in parallel on the same worker node. This function can also be parallelized across worker nodes by wrapping it in the <<StreamingExpressions-parallel,`parallel`>> function to provide parallel execution of expressions across a cluster.
+
+The `executor` function does not do anything specific with the output of the expressions that it runs. Therefore the expressions that are executed must contain the logic for pushing tuples to their destination. The <<StreamingExpressions-update,update function>> can be included in the expression being executed to send the tuples to a SolrCloud collection for storage.
+
+This model allows for asynchronous execution of jobs where the output is stored in a SolrCloud collection where it can be accessed as the job progresses.
+
+[[StreamingExpressions-Parameters.14]]
+==== Parameters
+
+* `threads`: (Optional) The number of threads in the executors thread pool for executing expressions.
+* `StreamExpression`: (Mandatory) The stream source which contains the Streaming Expressions to execute.
+
+[[StreamingExpressions-Syntax.15]]
+==== Syntax
+
+[source,text]
+----
+daemon(id="myDaemon",
+       terminate="true",
+       executor(threads=10,
+                topic(checkpointCollection
+                      storedExpressions,
+                      q="*:*",
+                      fl="id, expr_s",
+                      initialCheckPoint=0,
+                      id="myTopic")))
+----
+
+In the example above a <<StreamingExpressions-daemon,daemon>> wraps an executor**,** which wraps a <<StreamingExpressions-topic,topic>> that is returning tuples with expressions to execute. When sent to the stream handler, the daemon will call the executor at intervals which will cause the executor to read from the topic and execute the expressions found in the `expr_s` field. The daemon will repeatedly call the executor until all the tuples that match the topic have been iterated, then it will terminate. This is the approach for executing batches of streaming expressions from a `topic` queue.
+
+[[StreamingExpressions-fetch]]
+=== fetch
+
+The `fetch` function iterates a stream and fetches additional fields and adds them to the tuples. The `fetch` function fetches in batches to limit the number of calls back to Solr. Tuples streamed from the `fetch` function will contain the original fields and the additional fields that were fetched. The `fetch` function supports one-to-one fetches. Many-to-one fetches, where the stream source contains duplicate keys, will also work, but one-to-many fetches are currently not supported by this function.
+
+[[StreamingExpressions-Parameters.15]]
+==== Parameters
+
+* `Collection`: (Mandatory) The collection to fetch the fields from.
+* `StreamExpression`: (Mandatory) The stream source for the fetch function.
+* `fl`: (Mandatory) The fields to be fetched.
+* `on`: Fields to be used for checking equality of tuples between stream source and fetched records. Formatted as `on="fieldNameInTuple=fieldNameInCollection"`.
+* `batchSize`: (Optional) The batch fetch size.
+
+[[StreamingExpressions-Syntax.16]]
+==== Syntax
+
+[source,text]
+----
+fetch(addresses,
+      search(people, q="*:*", fl="username, firstName, lastName", sort="username asc"),
+      fl="streetAddress, city, state, country, zip",
+      on="username=userId")
+----
+
+The example above fetches addresses for users by matching the username in the tuple with the userId field in the addresses collection.
+
+[[StreamingExpressions-having]]
+=== having
+
+The `having` expression wraps a stream and applies a boolean operation to each tuple. It emits only tuples for which the boolean operation returns *true*.
+
+[[StreamingExpressions-Parameters.16]]
+==== Parameters
+
+* `StreamExpression`: (Mandatory) The stream source for the having function.
+* `booleanEvaluator`: (Madatory) The following boolean operations are supported: *eq* (equals), *gt* (greater than), *lt* (less than), *gteq* (greater than or equal to), *lteq* (less than or equal to), *and*, *or, eor* (exclusive or), and *not*. Boolean evaluators can be nested with other evaluators to form complex boolean logic.
+
+The comparison evaluators compare the value in a specific field with a value, whether a string, number, or boolean. For example: *eq*(field1, 10), returns true if *field1* is equal to 10.
+
+[[StreamingExpressions-Syntax.17]]
+==== Syntax
+
+[source,text]
+----
+having(rollup(over=a_s,
+              sum(a_i),
+              search(collection1,
+                     q=*:*,
+                     fl="id,a_s,a_i,a_f",
+                     sort="a_s asc")),
+       and(gt(sum(a_i), 100), lt(sum(a_i), 110)))
+
+----
+
+In this example, the `having` expression iterates the aggregated tuples from the `rollup` expression and emits all tuples where the field `sum(a_i)` is greater then 100 and less then 110.
+
+[[StreamingExpressions-leftOuterJoin]]
+=== leftOuterJoin
+
+The `leftOuterJoin` function wraps two streams, Left and Right, and emits tuples from Left. If there is a tuple in Right equal (as defined by `on`) then the values in that tuple will be included in the emitted tuple. An equal tuple in Right *need not* exist for the Left tuple to be emitted. This supports one-to-one, one-to-many, many-to-one, and many-to-many left outer join scenarios. The tuples are emitted in the order in which they appear in the Left stream. Both streams must be sorted by the fields being used to determine equality (using the `on` parameter). If both tuples contain a field of the same name then the value from the Right stream will be used in the emitted tuple.
+
+You can wrap the incoming streams with a `select` function to be specific about which field values are included in the emitted tuple.
+
+[[StreamingExpressions-Parameters.17]]
+==== Parameters
+
+* `StreamExpression for StreamLeft`
+* `StreamExpression for StreamRight`
+* `on`: Fields to be used for checking equality of tuples between Left and Right. Can be of the format `on="fieldName"`, `on="fieldNameInLeft=fieldNameInRight"`, or `on="fieldName, otherFieldName=rightOtherFieldName"`.
+
+[[StreamingExpressions-Syntax.18]]
+==== Syntax
+
+[source,text]
+----
+leftOuterJoin(
+  search(people, q=*:*, fl="personId,name", sort="personId asc"),
+  search(pets, q=type:cat, fl="personId,petName", sort="personId asc"),
+  on="personId"
+)
+
+leftOuterJoin(
+  search(people, q=*:*, fl="personId,name", sort="personId asc"),
+  search(pets, q=type:cat, fl="ownerId,petName", sort="ownerId asc"),
+  on="personId=ownerId"
+)
+
+leftOuterJoin(
+  search(people, q=*:*, fl="personId,name", sort="personId asc"),
+  select(
+    search(pets, q=type:cat, fl="ownerId,name", sort="ownerId asc"),
+    ownerId,
+    name as petName
+  ),
+  on="personId=ownerId"
+)
+----
+
+[[StreamingExpressions-hashJoin]]
+=== hashJoin
+
+The `hashJoin` function wraps two streams, Left and Right, and for every tuple in Left which exists in Right will emit a tuple containing the fields of both tuples. This supports one-to-one, one-to-many, many-to-one, and many-to-many inner join scenarios. The tuples are emitted in the order in which they appear in the Left stream. The order of the streams does not matter. If both tuples contain a field of the same name then the value from the Right stream will be used in the emitted tuple.
+
+You can wrap the incoming streams with a `select` function to be specific about which field values are included in the emitted tuple.
+
+The hashJoin function can be used when the tuples of Left and Right cannot be put in the same order. Because the tuples are out of order this stream functions by reading all values from the Right stream during the open operation and will store all tuples in memory. The result of this is a memory footprint equal to the size of the Right stream.
+
+[[StreamingExpressions-Parameters.18]]
+==== Parameters
+
+* `StreamExpression for StreamLeft`
+* `hashed=StreamExpression for StreamRight`
+* `on`: Fields to be used for checking equality of tuples between Left and Right. Can be of the format `on="fieldName"`, `on="fieldNameInLeft=fieldNameInRight"`, or `on="fieldName, otherFieldName=rightOtherFieldName"`.
+
+[[StreamingExpressions-Syntax.19]]
+==== Syntax
+
+[source,text]
+----
+hashJoin(
+  search(people, q=*:*, fl="personId,name", sort="personId asc"),
+  hashed=search(pets, q=type:cat, fl="personId,petName", sort="personId asc"),
+  on="personId"
+)
+
+hashJoin(
+  search(people, q=*:*, fl="personId,name", sort="personId asc"),
+  hashed=search(pets, q=type:cat, fl="ownerId,petName", sort="ownerId asc"),
+  on="personId=ownerId"
+)
+
+hashJoin(
+  search(people, q=*:*, fl="personId,name", sort="personId asc"),
+  hashed=select(
+    search(pets, q=type:cat, fl="ownerId,name", sort="ownerId asc"),
+    ownerId,
+    name as petName
+  ),
+  on="personId=ownerId"
+)
+----
+
+[[StreamingExpressions-innerJoin]]
+=== innerJoin
+
+Wraps two streams Left and Right and for every tuple in Left which exists in Right will emit a tuple containing the fields of both tuples. This supports one-one, one-many, many-one, and many-many inner join scenarios. The tuples are emitted in the order in which they appear in the Left stream. Both streams must be sorted by the fields being used to determine equality (the 'on' parameter). If both tuples contain a field of the same name then the value from the Right stream will be used in the emitted tuple. You can wrap the incoming streams with a select(...) to be specific about which field values are included in the emitted tuple.
+
+[[StreamingExpressions-Parameters.19]]
+==== Parameters
+
+* `StreamExpression for StreamLeft`
+* `StreamExpression for StreamRight`
+* `on`: Fields to be used for checking equality of tuples between Left and Right. Can be of the format `on="fieldName"`, `on="fieldNameInLeft=fieldNameInRight"`, or `on="fieldName, otherFieldName=rightOtherFieldName"`.
+
+[[StreamingExpressions-Syntax.20]]
+==== Syntax
+
+[source,text]
+----
+innerJoin(
+  search(people, q=*:*, fl="personId,name", sort="personId asc"),
+  search(pets, q=type:cat, fl="personId,petName", sort="personId asc"),
+  on="personId"
+)
+
+innerJoin(
+  search(people, q=*:*, fl="personId,name", sort="personId asc"),
+  search(pets, q=type:cat, fl="ownerId,petName", sort="ownerId asc"),
+  on="personId=ownerId"
+)
+
+innerJoin(
+  search(people, q=*:*, fl="personId,name", sort="personId asc"),
+  select(
+    search(pets, q=type:cat, fl="ownerId,name", sort="ownerId asc"),
+    ownerId,
+    name as petName
+  ),
+  on="personId=ownerId"
+)
+----
+
+[[StreamingExpressions-intersect]]
+=== intersect
+
+The `intersect` function wraps two streams, A and B, and emits tuples from A which *DO* exist in B. The tuples are emitted in the order in which they appear in stream A. Both streams must be sorted by the fields being used to determine equality (the `on` parameter). Only tuples from A are emitted.
+
+[[StreamingExpressions-Parameters.20]]
+==== Parameters
+
+* `StreamExpression for StreamA`
+* `StreamExpression for StreamB`
+* `on`: Fields to be used for checking equality of tuples between A and B. Can be of the format `on="fieldName"`, `on="fieldNameInLeft=fieldNameInRight"`, or `on="fieldName, otherFieldName=rightOtherFieldName"`.
+
+[[StreamingExpressions-Syntax.21]]
+==== Syntax
+
+[source,text]
+----
+intersect(
+  search(collection1, q=a_s:(setA || setAB), fl="id,a_s,a_i", sort="a_i asc, a_s asc"),
+  search(collection1, q=a_s:(setB || setAB), fl="id,a_s,a_i", sort="a_i asc"),
+  on="a_i"
+)
+
+intersect(
+  search(collection1, q=a_s:(setA || setAB), fl="id,a_s,a_i", sort="a_i asc, a_s asc"),
+  search(collection1, q=a_s:(setB || setAB), fl="id,a_s,a_i", sort="a_i asc, a_s asc"),
+  on="a_i,a_s"
+)
+----
+
+[[StreamingExpressions-merge]]
+=== merge
+
+The `merge` function merges two or more streaming expressions and maintains the ordering of the underlying streams. Because the order is maintained, the sorts of the underlying streams must line up with the on parameter provided to the merge function.
+
+[[StreamingExpressions-Parameters.21]]
+==== Parameters
+
+* `StreamExpression A`
+* `StreamExpression B`
+* `Optional StreamExpression C,D,....Z`
+* `on`: Sort criteria for performing the merge. Of the form `fieldName order` where order is `asc` or `desc`. Multiple fields can be provided in the form `fieldA order, fieldB order`.
+
+[[StreamingExpressions-Syntax.22]]
+==== Syntax
+
+[source,text]
+----
+# Merging two stream expressions together
+merge(
+      search(collection1,
+             q="id:(0 3 4)",
+             fl="id,a_s,a_i,a_f",
+             sort="a_f asc"),
+      search(collection1,
+             q="id:(1)",
+             fl="id,a_s,a_i,a_f",
+             sort="a_f asc"),
+      on="a_f asc")
+----
+
+[source,text]
+----
+# Merging four stream expressions together. Notice that while the sorts of each stream are not identical they are
+# comparable. That is to say the first N fields in each stream's sort matches the N fields in the merge's on clause.
+merge(
+      search(collection1,
+             q="id:(0 3 4)",
+             fl="id,fieldA,fieldB,fieldC",
+             sort="fieldA asc, fieldB desc"),
+      search(collection1,
+             q="id:(1)",
+             fl="id,fieldA",
+             sort="fieldA asc"),
+      search(collection2,
+             q="id:(10 11 13)",
+             fl="id,fieldA,fieldC",
+             sort="fieldA asc"),
+      search(collection3,
+             q="id:(987)",
+             fl="id,fieldA,fieldC",
+             sort="fieldA asc"),
+      on="fieldA asc")
+----
+
+[[StreamingExpressions-list]]
+=== list
+// TODO
+
+[[StreamingExpressions-null]]
+=== null
+
+The null expression is a useful utility function for understanding bottlenecks when performing parallel relational algebra (joins, intersections, rollups etc.). The null function reads all the tuples from an underlying stream and returns a single tuple with the count and processing time. Because the null stream adds minimal overhead of it's own, it can be used to isolate the performance of Solr's /export handler. If the /export handlers performance is not the bottleneck, then the bottleneck is likely occurring in the workers where the stream decorators are running.
+
+The null expression can be wrapped by the parallel function and sent to worker nodes. In this scenario each worker will return one tuple with the count of tuples processed on the worker and the timing information for that worker. This gives valuable information such as:
+
+1.  As more workers are added does the performance of the /export handler improve or not.
+2.  Are tuples being evenly distributed across the workers, or is the hash partitioning sending more documents to a single worker.
+3.  Are all workers processing data at the same speed, or is one of the workers the source of the bottleneck.
+
+[[StreamingExpressions-Parameters.22]]
+==== Parameters
+
+* `StreamExpression`: (Mandatory) The expression read by the null function.
+
+[[StreamingExpressions-Syntax.23]]
+==== Syntax
+
+[source,text]
+----
+ parallel(workerCollection,
+          null(search(collection1, q=*:*, fl="id,a_s,a_i,a_f", sort="a_s desc", qt="/export", partitionKeys="a_s")),
+          workers="20",
+          zkHost="localhost:9983",
+          sort="a_s desc")
+----
+
+The expression above shows a parallel function wrapping a null function. This will cause the null function to be run in parallel across 20 worker nodes. Each worker will return a single tuple with number of tuples processed and time it took to iterate the tuples.
+
+[[StreamingExpressions-outerHashJoin]]
+=== outerHashJoin
+
+The `outerHashJoin` function wraps two streams, Left and Right, and emits tuples from Left. If there is a tuple in Right equal (as defined by the `on` parameter) then the values in that tuple will be included in the emitted tuple. An equal tuple in Right *need not* exist for the Left tuple to be emitted. This supports one-to-one, one-to-many, many-to-one, and many-to-many left outer join scenarios. The tuples are emitted in the order in which they appear in the Left stream. The order of the streams does not matter. If both tuples contain a field of the same name then the value from the Right stream will be used in the emitted tuple.
+
+You can wrap the incoming streams with a `select` function to be specific about which field values are included in the emitted tuple.
+
+The outerHashJoin stream can be used when the tuples of Left and Right cannot be put in the same order. Because the tuples are out of order, this stream functions by reading all values from the Right stream during the open operation and will store all tuples in memory. The result of this is a memory footprint equal to the size of the Right stream.
+
+[[StreamingExpressions-Parameters.23]]
+==== Parameters
+
+* `StreamExpression for StreamLeft`
+* `hashed=StreamExpression for StreamRight`
+* `on`: Fields to be used for checking equality of tuples between Left and Right. Can be of the format `on="fieldName"`, `on="fieldNameInLeft=fieldNameInRight"`, or `on="fieldName, otherFieldName=rightOtherFieldName"`.
+
+[[StreamingExpressions-Syntax.24]]
+==== Syntax
+
+[source,text]
+----
+outerHashJoin(
+  search(people, q=*:*, fl="personId,name", sort="personId asc"),
+  hashed=search(pets, q=type:cat, fl="personId,petName", sort="personId asc"),
+  on="personId"
+)
+
+outerHashJoin(
+  search(people, q=*:*, fl="personId,name", sort="personId asc"),
+  hashed=search(pets, q=type:cat, fl="ownerId,petName", sort="ownerId asc"),
+  on="personId=ownerId"
+)
+
+outerHashJoin(
+  search(people, q=*:*, fl="personId,name", sort="personId asc"),
+  hashed=select(
+    search(pets, q=type:cat, fl="ownerId,name", sort="ownerId asc"),
+    ownerId,
+    name as petName
+  ),
+  on="personId=ownerId"
+)
+----
+
+[[StreamingExpressions-parallel]]
+=== parallel
+
+The `parallel` function wraps a streaming expression and sends it to N worker nodes to be processed in parallel.
+
+The parallel function requires that the `partitionKeys` parameter be provided to the underlying searches. The `partitionKeys` parameter will partition the search results (tuples) across the worker nodes. Tuples with the same values in the partitionKeys field will be shuffled to the same worker nodes.
+
+The parallel function maintains the sort order of the tuples returned by the worker nodes, so the sort criteria of the parallel function must match up with the sort order of the tuples returned by the workers.
+
+.Worker Collections
+[TIP]
+====
+The worker nodes can be from the same collection as the data, or they can be a different collection entirely, even one that only exists for parallel streaming expressions. A worker collection can be any SolrCloud collection that has the `/stream` handler configured. Unlike normal SolrCloud collections, worker collections don't have to hold any data. Worker collections can be empty collections that exist only to execute streaming expressions.
+====
+
+[[StreamingExpressions-Parameters.24]]
+==== Parameters
+
+* `collection`: Name of the worker collection to send the StreamExpression to.
+* `StreamExpression`: Expression to send to the worker collection.
+* `workers`: Number of workers in the worker collection to send the expression to.
+* `zkHost`: (Optional) The ZooKeeper connect string where the worker collection resides.
+* `sort`: The sort criteria for ordering tuples returned by the worker nodes.
+
+[[StreamingExpressions-Syntax.25]]
+==== Syntax
+
+[source,text]
+----
+ parallel(workerCollection,
+          reduce(search(collection1, q=*:*, fl="id,a_s,a_i,a_f", sort="a_s desc", partitionKeys="a_s"),
+                 by="a_s",
+                 group(sort="a_f desc", n="4")),
+          workers="20",
+          zkHost="localhost:9983",
+          sort="a_s desc")
+----
+
+The expression above shows a `parallel` function wrapping a `reduce` function. This will cause the `reduce` function to be run in parallel across 20 worker nodes.
+
+[[StreamingExpressions-priority]]
+=== priority
+
+The `priority` function is a simple priority scheduler for the <<StreamingExpressions-executor,executor>> function. The executor function doesn't directly have a concept of task prioritization; instead it simply executes tasks in the order that they are read from it's underlying stream. The `priority` function provides the ability to schedule a higher priority task ahead of lower priority tasks that were submitted earlier.
+
+The `priority` function wraps two <<StreamingExpressions-topic,topics>> that are both emitting tuples that contain streaming expressions to execute. The first topic is considered the higher priority task queue.
+
+Each time the `priority` function is called, it checks the higher priority task queue to see if there are any tasks to execute. If tasks are waiting in the higher priority queue then the priority function will emit the higher priority tasks. If there are no high priority tasks to run, the lower priority queue tasks are emitted.
+
+The `priority` function will only emit a batch of tasks from one of the queues each time it is called. This ensures that no lower priority tasks are executed until the higher priority queue has no tasks to run.
+
+[[StreamingExpressions-Parameters.25]]
+==== Parameters
+
+* `topic expression`: (Mandatory) the high priority task queue
+* `topic expression`: (Mandatory) the lower priority task queue
+
+[[StreamingExpressions-Syntax.26]]
+==== Syntax
+
+[source,text]
+----
+daemon(id="myDaemon",
+       executor(threads=10,
+                priority(topic(checkpointCollection, storedExpressions, q="priority:high", fl="id, expr_s", initialCheckPoint=0,id="highPriorityTasks"),
+                         topic(checkpointCollection, storedExpressions, q="priority:low", fl="id, expr_s", initialCheckPoint=0,id="lowPriorityTasks"))))
+----
+
+In the example above the `daemon` function is calling the executor iteratively. Each time it's called, the `executor` function will execute the tasks emitted by the `priority` function. The `priority` function wraps two topics. The first topic is the higher priority task queue, the second topics is the lower priority topic.
+
+[[StreamingExpressions-reduce]]
+=== reduce
+
+The `reduce` function wraps an internal stream and groups tuples by common fields.
+
+Each tuple group is operated on as a single block by a pluggable reduce operation. The group operation provided with Solr implements distributed grouping functionality. The group operation also serves as an example reduce operation that can be referred to when building custom reduce operations.
+
+[IMPORTANT]
+====
+The reduce function relies on the sort order of the underlying stream. Accordingly the sort order of the underlying stream must be aligned with the group by field.
+====
+
+[[StreamingExpressions-Parameters.26]]
+==== Parameters
+
+* `StreamExpression`: (Mandatory)
+* `by`: (Mandatory) A comma separated list of fields to group by.
+* `Reduce Operation`: (Mandatory)
+
+[[StreamingExpressions-Syntax.27]]
+==== Syntax
+
+[source,text]
+----
+reduce(search(collection1, q=*:*, fl="id,a_s,a_i,a_f", sort="a_s asc, a_f asc"),
+       by="a_s",
+       group(sort="a_f desc", n="4")
+)
+----
+
+[[StreamingExpressions-rollup]]
+=== rollup
+
+The `rollup` function wraps another stream function and rolls up aggregates over bucket fields. The rollup function relies on the sort order of the underlying stream to rollup aggregates one grouping at a time. Accordingly, the sort order of the underlying stream must match the fields in the `over` parameter of the rollup function.
+
+The rollup function also needs to process entire result sets in order to perform its aggregations. When the underlying stream is the `search` function, the `/export` handler can be used to provide full sorted result sets to the rollup function. This sorted approach allows the rollup function to perform aggregations over very high cardinality fields. The disadvantage of this approach is that the tuples must be sorted and streamed across the network to a worker node to be aggregated. For faster aggregation over low to moderate cardinality fields, the `facet` function can be used.
+
+[[StreamingExpressions-Parameters.27]]
+==== Parameters
+
+* `StreamExpression` (Mandatory)
+* `over`: (Mandatory) A list of fields to group by.
+* `metrics`: (Mandatory) The list of metrics to compute. Currently supported metrics are `sum(col)`, `avg(col)`, `min(col)`, `max(col)`, `count(*)`.
+
+[[StreamingExpressions-Syntax.28]]
+==== Syntax
+
+[source,text]
+----
+rollup(
+   search(collection1, q=*:*, fl="a_s,a_i,a_f", qt="/export", sort="a_s asc"),
+   over="a_s",
+   sum(a_i),
+   sum(a_f),
+   min(a_i),
+   min(a_f),
+   max(a_i),
+   max(a_f),
+   avg(a_i),
+   avg(a_f),
+   count(*)
+)
+----
+
+The example about shows the rollup function wrapping the search function. Notice that search function is using the `/export` handler to provide the entire result set to the rollup stream. Also notice that the search function's *sort param* matches up with the rollup's `over` parameter. This allows the rollup function to rollup the over the `a_s` field, one group at a time.
+
+[[StreamingExpressions-scoreNodes]]
+=== scoreNodes
+
+See section in <<graph-traversal.adoc#GraphTraversal-UsingthescoreNodesFunctiontoMakeaRecommendation,graph traversal>>.
+
+[[StreamingExpressions-select]]
+=== select
+
+The `select` function wraps a streaming expression and outputs tuples containing a subset or modified set of fields from the incoming tuples. The list of fields included in the output tuple can contain aliases to effectively rename fields. The select stream supports both operations and evaluators. One can provide a list of operations and evaluators to perform on any fields, such as `replace, add, if`, etc....
+
+[[StreamingExpressions-Parameters.28]]
+==== Parameters
+
+* `StreamExpression`
+* `fieldName`: name of field to include in the output tuple (can include multiple of these), such as `outputTuple[fieldName] = inputTuple[fieldName]`
+* `fieldName as aliasFieldName`: aliased field name to include in the output tuple (can include multiple of these), such as `outputTuple[aliasFieldName] = incomingTuple[fieldName]`
+* `replace(fieldName, value, withValue=replacementValue)`: if `incomingTuple[fieldName] == value` then `outgoingTuple[fieldName]` will be set to `replacementValue`. `value` can be the string "null" to replace a null value with some other value.
+* `replace(fieldName, value, withField=otherFieldName)`: if `incomingTuple[fieldName] == value` then `outgoingTuple[fieldName]` will be set to the value of `incomingTuple[otherFieldName]`. `value` can be the string "null" to replace a null value with some other value.
+
+[[StreamingExpressions-Syntax.29]]
+==== Syntax
+
+[source,text]
+----
+// output tuples with fields teamName, wins, losses, and winPercentages where a null value for wins or losses is translated to the value of 0
+select(
+  search(collection1, fl="id,teamName_s,wins,losses", q="*:*", sort="id asc"),
+  teamName_s as teamName,
+  wins,
+  losses,
+  replace(wins,null,withValue=0),
+  replace(losses,null,withValue=0),
+  if(eq(0,wins), 0, div(add(wins,losses), wins)) as winPercentage
+)
+----
+
+[[StreamingExpressions-sort]]
+=== sort
+
+The `sort` function wraps a streaming expression and re-orders the tuples. The sort function emits all incoming tuples in the new sort order. The sort function reads all tuples from the incoming stream, re-orders them using an algorithm with `O(nlog(n))` performance characteristics, where n is the total number of tuples in the incoming stream, and then outputs the tuples in the new sort order. Because all tuples are read into memory, the memory consumption of this function grows linearly with the number of tuples in the incoming stream.
+
+[[StreamingExpressions-Parameters.29]]
+==== Parameters
+
+* `StreamExpression`
+* `by`: Sort criteria for re-ordering the tuples
+
+[[StreamingExpressions-Syntax.30]]
+==== Syntax
+
+The expression below finds dog owners and orders the results by owner and pet name. Notice that it uses an efficient innerJoin by first ordering by the person/owner id and then re-orders the final output by the owner and pet names.
+
+[source,text]
+----
+sort(
+  innerJoin(
+    search(people, q=*:*, fl="id,name", sort="id asc"),
+    search(pets, q=type:dog, fl="owner,petName", sort="owner asc"),
+    on="id=owner"
+  ),
+  by="name asc, petName asc"
+)
+----
+
+[[StreamingExpressions-top]]
+=== top
+
+The `top` function wraps a streaming expression and re-orders the tuples. The top function emits only the top N tuples in the new sort order. The top function re-orders the underlying stream so the sort criteria *does not* have to match up with the underlying stream.
+
+[[StreamingExpressions-Parameters.30]]
+==== Parameters
+
+* `n`: Number of top tuples to return.
+* `StreamExpression`
+* `sort`: Sort criteria for selecting the top N tuples.
+
+[[StreamingExpressions-Syntax.31]]
+==== Syntax
+
+The expression below finds the top 3 results of the underlying search. Notice that it reverses the sort order. The top function re-orders the results of the underlying stream.
+
+[source,text]
+----
+top(n=3,
+     search(collection1,
+            q="*:*",
+            qt="/export",
+            fl="id,a_s,a_i,a_f",
+            sort="a_f desc, a_i desc"),
+      sort="a_f asc, a_i asc")
+----
+
+[[StreamingExpressions-unique]]
+=== unique
+
+The `unique` function wraps a streaming expression and emits a unique stream of tuples based on the `over` parameter. The unique function relies on the sort order of the underlying stream. The `over` parameter must match up with the sort order of the underlying stream.
+
+The unique function implements a non-co-located unique algorithm. This means that records with the same unique `over` field do not need to be co-located on the same shard. When executed in the parallel, the `partitionKeys` parameter must be the same as the unique `over` field so that records with the same keys will be shuffled to the same worker.
+
+[[StreamingExpressions-Parameters.31]]
+==== Parameters
+
+* `StreamExpression`
+* `over`: The unique criteria.
+
+[[StreamingExpressions-Syntax.32]]
+==== Syntax
+
+[source,text]
+----
+unique(
+  search(collection1,
+         q="*:*",
+         qt="/export",
+         fl="id,a_s,a_i,a_f",
+         sort="a_f asc, a_i asc"),
+  over="a_f")
+----
+
+[[StreamingExpressions-update]]
+=== update
+
+The `update` function wraps another functions and sends the tuples to a SolrCloud collection for indexing.
+
+[[StreamingExpressions-Parameters.32]]
+==== Parameters
+
+* `destinationCollection`: (Mandatory) The collection where the tuples will indexed.
+* `batchSize`: (Mandatory) The indexing batch size.
+* `StreamExpression`: (Mandatory)
+
+[[StreamingExpressions-Syntax.33]]
+==== Syntax
+
+[source,text]
+----
+ update(destinationCollection,
+        batchSize=500,
+        search(collection1,
+               q=*:*,
+               fl="id,a_s,a_i,a_f,s_multi,i_multi",
+               sort="a_f asc, a_i asc"))
+
+----
+
+The example above sends the tuples returned by the `search` function to the `destinationCollection` to be indexed.
+
+[[StreamingExpressions-StreamEvaluators]]
+== Stream Evaluators
+
+Stream Evaluators can be used to evaluate (calculate) new values based on other values in a tuple. That newly evaluated value can be put into the tuple (as part of a `select(...)` clause), used to filter streams (as part of a `having(...)` clause), and for other things. Evaluators can contain field names, raw values, or other evaluators, giving you the ability to create complex evaluation logic, including conditional if/then choices.
+
+In cases where you want to use raw values as part of an evaluation you will need to consider the order of how evaluators are parsed.
+
+1.  If the parameter can be parsed into a valid number, then it is considered a number. For example, `add(3,4.5)`
+2.  If the parameter can be parsed into a valid boolean, then it is considered a boolean. For example, `eq(true,false)`
+3.  If the parameter can be parsed into a valid evaluator, then it is considered an evaluator. For example, `eq(add(10,4),add(7,7))`
+4.  The parameter is considered a field name, even if it quoted. For example, `eq(fieldA,"fieldB")`
+
+If you wish to use a raw string as part of an evaluation, you will want to consider using the `raw(string)` evaluator. This will always return the raw value, no matter what is entered.analyze (6.6)
+
+[[StreamingExpressions-abs]]
+=== abs
+
+The `abs` function will return the absolute value of the provided single parameter. The `abs` function will fail to execute if the value is non-numeric. If a null value is found then null will be returned as the result.
+
+[[StreamingExpressions-Parameters.33]]
+==== Parameters
+
+* `Field Name | Raw Number | Number Evaluator`
+
+[[StreamingExpressions-Syntax.34]]
+==== Syntax
+
+The expressions below show the various ways in which you can use the `abs` evaluator. Only one parameter is accepted. Returns a numeric value.
+
+[source,text]
+----
+abs(1) // 1, not really a good use case for it
+abs(-1) // 1, not really a good use case for it
+abs(add(fieldA,fieldB)) // absolute value of fieldA + fieldB
+abs(fieldA) // absolute value of fieldA
+----
+
+[[StreamingExpressions-add]]
+=== add
+
+The `add` function will take 2 or more numeric values and add them together. The `add` function will fail to execute if any of the values are non-numeric. If a null value is found then null will be returned as the result.
+
+[[StreamingExpressions-Parameters.34]]
+==== Parameters
+
+* `Field Name | Raw Number | Number Evaluator`
+* `Field Name | Raw Number | Number Evaluator`
+* `......`
+* `Field Name | Raw Number | Number Evaluator`
+
+[[StreamingExpressions-Syntax.35]]
+==== Syntax
+
+The expressions below show the various ways in which you can use the `add` evaluator. The number and order of these parameters do not matter and is not limited except that at least two parameters are required. Returns a numeric value.
+
+[source,text]
+----
+add(1,2,3,4) // 1 + 2 + 3 + 4 == 10
+add(1,fieldA) // 1 + value of fieldA
+add(fieldA,1.4) // value of fieldA + 1.4
+add(fieldA,fieldB,fieldC) // value of fieldA + value of fieldB + value of fieldC
+add(fieldA,div(fieldA,fieldB)) // value of fieldA + (value of fieldA / value of fieldB)
+add(fieldA,if(gt(fieldA,fieldB),fieldA,fieldB)) // if fieldA > fieldB then fieldA + fieldA, else fieldA + fieldB
+----
+
+[[StreamingExpressions-div]]
+=== div
+
+The `div` function will take two numeric values and divide them. The function will fail to execute if any of the values are non-numeric or null, or the 2nd value is 0. Returns a numeric value.
+
+[[StreamingExpressions-Parameters.35]]
+==== Parameters
+
+* `Field Name | Raw Number | Number Evaluator`
+* `Field Name | Raw Number | Number Evaluator`
+
+[[StreamingExpressions-Syntax.36]]
+==== Syntax
+
+The expressions below show the various ways in which you can use the `div` evaluator. The first value will be divided by the second and as such the second cannot be 0.
+
+[source,text]
+----
+div(1,2) // 1 / 2
+div(1,fieldA) // 1 / fieldA
+div(fieldA,1.4) // fieldA / 1.4
+div(fieldA,add(fieldA,fieldB)) // fieldA / (fieldA + fieldB)
+----
+
+[[StreamingExpressions-log]]
+=== log
+
+The `log` function will return the natural log of the provided single parameter. The `log` function will fail to execute if the value is non-numeric. If a null value is found, then null will be returned as the result.
+
+[[StreamingExpressions-Parameters.36]]
+==== Parameters
+
+* `Field Name | Raw Number | Number Evaluator`
+
+[[StreamingExpressions-Syntax.37]]
+==== Syntax
+
+The expressions below show the various ways in which you can use the `log` evaluator. Only one parameter is accepted. Returns a numeric value.
+
+[source,text]
+----
+log(100)
+log(add(fieldA,fieldB))
+log(fieldA)
+----
+
+[[StreamingExpressions-mult]]
+=== mult
+
+The `mult` function will take two or more numeric values and multiply them together. The `mult` function will fail to execute if any of the values are non-numeric. If a null value is found then null will be returned as the result.
+
+[[StreamingExpressions-Parameters.37]]
+==== Parameters
+
+* `Field Name | Raw Number | Number Evaluator`
+* `Field Name | Raw Number | Number Evaluator`
+* `......`
+* `Field Name | Raw Number | Number Evaluator`
+
+[[StreamingExpressions-Syntax.38]]
+==== Syntax
+
+The expressions below show the various ways in which you can use the `mult` evaluator. The number and order of these parameters do not matter and is not limited except that at least two parameters are required. Returns a numeric value.
+
+[source,text]
+----
+mult(1,2,3,4) // 1 * 2 * 3 * 4
+mult(1,fieldA) // 1 * value of fieldA
+mult(fieldA,1.4) // value of fieldA * 1.4
+mult(fieldA,fieldB,fieldC) // value of fieldA * value of fieldB * value of fieldC
+mult(fieldA,div(fieldA,fieldB)) // value of fieldA * (value of fieldA / value of fieldB)
+mult(fieldA,if(gt(fieldA,fieldB),fieldA,fieldB)) // if fieldA > fieldB then fieldA * fieldA, else fieldA * fieldB
+----
+
+[[StreamingExpressions-sub]]
+=== sub
+
+The `sub` function will take 2 or more numeric values and subtract them, from left to right. The sub function will fail to execute if any of the values are non-numeric. If a null value is found then null will be returned as the result.
+
+[[StreamingExpressions-Parameters.38]]
+==== Parameters
+
+* `Field Name | Raw Number | Number Evaluator`
+* `Field Name | Raw Number | Number Evaluator`
+* `......`
+* `Field Name | Raw Number | Number Evaluator`
+
+[[StreamingExpressions-Syntax.39]]
+==== Syntax
+
+The expressions below show the various ways in which you can use the `sub` evaluator. The number of these parameters does not matter and is not limited except that at least two parameters are required. Returns a numeric value.
+
+[source,text]
+----
+sub(1,2,3,4) // 1 - 2 - 3 - 4
+sub(1,fieldA) // 1 - value of fieldA
+sub(fieldA,1.4) // value of fieldA - 1.4
+sub(fieldA,fieldB,fieldC) // value of fieldA - value of fieldB - value of fieldC
+sub(fieldA,div(fieldA,fieldB)) // value of fieldA - (value of fieldA / value of fieldB)
+if(gt(fieldA,fieldB),sub(fieldA,fieldB),sub(fieldB,fieldA)) // if fieldA > fieldB then fieldA - fieldB, else fieldB - field
+----
+
+[[StreamingExpressions-pow]]
+=== pow
+//TODO
+
+[[StreamingExpressions-mod]]
+=== mod
+//TODO
+
+[[StreamingExpressions-ceil]]
+==== ceil
+//TODO
+
+[[StreamingExpressions-floor]]
+=== floor
+//TODO
+
+[[StreamingExpressions-sin]]
+=== sin
+//TODO
+
+[[StreamingExpressions-asin]]
+=== asin
+//TODO
+
+[[StreamingExpressions-sinh]]
+=== sinh
+//TODO
+
+[[StreamingExpressions-cos]]
+=== cos
+//TODO
+
+[[StreamingExpressions-acos]]
+=== acos
+//TODO
+
+[[StreamingExpressions-atan]]
+=== atan
+//TODO
+
+[[StreamingExpressions-round]]
+=== round
+//TODO
+
+[[StreamingExpressions-sqrt]]
+=== sqrt
+//TODO
+
+[[StreamingExpressions-cbrt]]
+=== cbrt
+
+=== and
+
+The `and` function will return the logical AND of at least 2 boolean parameters. The function will fail to execute if any parameters are non-boolean or null. Returns a boolean value.
+
+[[StreamingExpressions-Parameters.39]]
+==== Parameters
+
+* `Field Name | Raw Boolean | Boolean Evaluator`
+* `Field Name | Raw Boolean | Boolean Evaluator`
+* `......`
+* `Field Name | Raw Boolean | Boolean Evaluator`
+
+[[StreamingExpressions-Syntax.40]]
+==== Syntax
+
+The expressions below show the various ways in which you can use the `and` evaluator. At least two parameters are required, but there is no limit to how many you can use.
+
+[source,text]
+----
+and(true,fieldA) // true && fieldA
+and(fieldA,fieldB) // fieldA && fieldB
+and(or(fieldA,fieldB),fieldC) // (fieldA || fieldB) && fieldC
+and(fieldA,fieldB,fieldC,or(fieldD,fieldE),fieldF)
+----
+
+[[StreamingExpressions-eq]]
+=== eq
+
+The `eq` function will return whether all the parameters are equal, as per Java's standard `equals(...)` function. The function accepts parameters of any type, but will fail to execute if all the parameters are not of the same type. That is, all are Boolean, all are String, all are Numeric. If any any parameters are null and there is at least one parameter that is not null then false will be returned. Returns a boolean value.
+
+[[StreamingExpressions-Parameters.40]]
+==== Parameters
+
+* `Field Name | Raw Value | Evaluator`
+* `Field Name | Raw Value | Evaluator`
+* `......`
+* `Field Name | Raw Value | Evaluator`
+
+[[StreamingExpressions-Syntax.41]]
+==== Syntax
+
+The expressions below show the various ways in which you can use the `eq` evaluator.
+
+[source,text]
+----
+eq(1,2) // 1 == 2
+eq(1,fieldA) // 1 == fieldA
+eq(fieldA,val(foo)) fieldA == "foo"
+eq(add(fieldA,fieldB),6) // fieldA + fieldB == 6
+----
+
+[[StreamingExpressions-eor]]
+=== eor
+
+The `eor` function will return the logical exclusive or of at least two boolean parameters. The function will fail to execute if any parameters are non-boolean or null. Returns a boolean value.
+
+[[StreamingExpressions-Parameters.41]]
+==== Parameters
+
+* `Field Name | Raw Boolean | Boolean Evaluator`
+* `Field Name | Raw Boolean | Boolean Evaluator`
+* `......`
+* `Field Name | Raw Boolean | Boolean Evaluator`
+
+[[StreamingExpressions-Syntax.42]]
+==== Syntax
+
+The expressions below show the various ways in which you can use the `eor` evaluator. At least two parameters are required, but there is no limit to how many you can use.
+
+[source,text]
+----
+eor(true,fieldA) // true iff fieldA is false
+eor(fieldA,fieldB) // true iff either fieldA or fieldB is true but not both
+eor(eq(fieldA,fieldB),eq(fieldC,fieldD)) // true iff either fieldA == fieldB or fieldC == fieldD but not both
+----
+
+[[StreamingExpressions-gteq]]
+=== gteq
+
+The `gteq` function will return whether the first parameter is greater than or equal to the second parameter. The function accepts numeric and string parameters, but will fail to execute if all the parameters are not of the same type. That is, all are String or all are Numeric. If any any parameters are null then an error will be raised. Returns a boolean value.
+
+[[StreamingExpressions-Parameters.42]]
+==== Parameters
+
+* `Field Name | Raw Value | Evaluator`
+* `Field Name | Raw Value | Evaluator`
+
+[[StreamingExpressions-Syntax.43]]
+==== Syntax
+
+The expressions below show the various ways in which you can use the `gteq` evaluator.
+
+[source,text]
+----
+gteq(1,2) // 1 >= 2
+gteq(1,fieldA) // 1 >= fieldA
+gteq(fieldA,val(foo)) fieldA >= "foo"
+gteq(add(fieldA,fieldB),6) // fieldA + fieldB >= 6
+----
+
+[[StreamingExpressions-gt]]
+=== gt
+
+The `gt` function will return whether the first parameter is greater than the second parameter. The function accepts numeric or string parameters, but will fail to execute if all the parameters are not of the same type. That is, all are String or all are Numeric. If any any parameters are null then an error will be raised. Returns a boolean value.
+
+[[StreamingExpressions-Parameters.43]]
+==== Parameters
+
+* `Field Name | Raw Value | Evaluator`
+* `Field Name | Raw Value | Evaluator`
+
+[[StreamingExpressions-Syntax.44]]
+==== Syntax
+
+The expressions below show the various ways in which you can use the `gt` evaluator.
+
+[source,text]
+----
+gt(1,2) // 1 > 2
+gt(1,fieldA) // 1 > fieldA
+gt(fieldA,val(foo)) fieldA > "foo"
+gt(add(fieldA,fieldB),6) // fieldA + fieldB > 6
+----
+
+[[StreamingExpressions-if]]
+=== if
+
+The `if` function works like a standard conditional if/then statement. If the first parameter is true, then the second parameter will be returned, else the third parameter will be returned. The function accepts a boolean as the first parameter and anything as the second and third parameters. An error will occur if the first parameter is not a boolean or is null.
+
+[[StreamingExpressions-Parameters.44]]
+==== Parameters
+
+* `Field Name | Raw Value | Boolean Evaluator`
+* `Field Name | Raw Value | Evaluator`
+* `Field Name | Raw Value | Evaluator`
+
+[[StreamingExpressions-Syntax.45]]
+==== Syntax
+
+The expressions below show the various ways in which you can use the `if` evaluator.
+
+[source,text]
+----
+if(fieldA,fieldB,fieldC) // if fieldA is true then fieldB else fieldC
+if(gt(fieldA,5), fieldA, 5) // if fieldA > 5 then fieldA else 5
+if(eq(fieldB,null), null, div(fieldA,fieldB)) // if fieldB is null then null else fieldA / fieldB
+----
+
+[[StreamingExpressions-lteq]]
+=== lteq
+
+The l`teq` function will return whether the first parameter is less than or equal to the second parameter. The function accepts numeric and string parameters, but will fail to execute if all the parameters are not of the same type. That is, all are String or all are Numeric. If any any parameters are null then an error will be raised. Returns a boolean value.
+
+[[StreamingExpressions-Parameters.45]]
+==== Parameters
+
+* `Field Name | Raw Value | Evaluator`
+* `Field Name | Raw Value | Evaluator`
+
+[[StreamingExpressions-Syntax.46]]
+==== Syntax
+
+The expressions below show the various ways in which you can use the `lteq` evaluator.
+
+[source,text]
+----
+lteq(1,2) // 1 <= 2
+lteq(1,fieldA) // 1 <= fieldA
+lteq(fieldA,val(foo)) fieldA <= "foo"
+lteq(add(fieldA,fieldB),6) // fieldA + fieldB <= 6
+----
+
+[[StreamingExpressions-lt]]
+=== lt
+
+The `lt` function will return whether the first parameter is less than the second parameter. The function accepts numeric or string parameters, but will fail to execute if all the parameters are not of the same type. That is, all are String or all are Numeric. If any any parameters are null then an error will be raised. Returns a boolean value.
+
+[[StreamingExpressions-Parameters.46]]
+==== Parameters
+
+* `Field Name | Raw Value | Evaluator`
+* `Field Name | Raw Value | Evaluator`
+
+[[StreamingExpressions-Syntax.47]]
+==== Syntax
+
+The expressions below show the various ways in which you can use the `lt` evaluator.
+
+[source,text]
+----
+lt(1,2) // 1 < 2
+lt(1,fieldA) // 1 < fieldA
+lt(fieldA,val(foo)) fieldA < "foo"
+lt(add(fieldA,fieldB),6) // fieldA + fieldB < 6
+----
+
+[[StreamingExpressions-not]]
+=== not
+
+The `not` function will return the logical NOT of a single boolean parameter. The function will fail to execute if the parameter is non-boolean or null. Returns a boolean value.
+
+[[StreamingExpressions-Parameters.47]]
+==== Parameters
+
+* `Field Name | Raw Boolean | Boolean Evaluator`
+
+[[StreamingExpressions-Syntax.48]]
+==== Syntax
+
+The expressions below show the various ways in which you can use the `not` evaluator. Only one parameter is allowed.
+
+[source,text]
+----
+not(true) // false
+not(fieldA) // true if fieldA is false else false
+not(eq(fieldA,fieldB)) // true if fieldA != fieldB
+----
+
+[[StreamingExpressions-or]]
+=== or
+
+The `or` function will return the logical OR of at leas

<TRUNCATED>

Mime
View raw message