This is an automated email from the ASF dualhosted git repository.
ctargett pushed a commit to branch jira/solr13105toMerge
in repository https://gitbox.apache.org/repos/asf/lucenesolr.git
The following commit(s) were added to refs/heads/jira/solr13105toMerge by this push:
new 0253a62 Fix a some typos, parameter/field name formatting
0253a62 is described below
commit 0253a62acf15d7545fac606d2125c879fa852c79
Author: Cassandra Targett <ctargett@apache.org>
AuthorDate: Thu Jan 7 15:34:41 2021 0600
Fix a some typos, parameter/field name formatting

solr/solrrefguide/src/loading.adoc  85 ++++
solr/solrrefguide/src/mathexpressions.adoc  1 +
solr/solrrefguide/src/mathstart.adoc  42 ++
.../src/probabilitydistributions.adoc  2 +
solr/solrrefguide/src/searchsample.adoc  209 +++++++++
solr/solrrefguide/src/streamingexpressions.adoc  4 +
6 files changed, 151 insertions(+), 192 deletions()
diff git a/solr/solrrefguide/src/loading.adoc b/solr/solrrefguide/src/loading.adoc
index e49103f..6b10b4b 100644
 a/solr/solrrefguide/src/loading.adoc
+++ b/solr/solrrefguide/src/loading.adoc
@@ 16,7 +16,6 @@
// specific language governing permissions and limitations
// under the License.

Streaming Expressions has support for reading, parsing, transforming, visualizing
and loading CSV and TSV formatted data. These functions are designed to cut down the
time spent on data preparation and allow users to begin data exploration before the data
is
@@ 25,21 +24,21 @@ loaded into Solr.
== Reading Files
The `cat` function can be used to read files under the *userfiles* directory in
$SOLR_HOME. The `cat` function takes two parameters. The first parameter is a comma
delimited list of paths. If the path list contains directories, `cat` will crawl
all the files in the directory and subdirectories. If the path list contains only
files `cat` will read just the specific files.
+$SOLR_HOME. The `cat` function takes two parameters.
+
+The first parameter is a commadelimited list of paths.
+If the path list contains directories, `cat` will crawl all the files in the directory and
subdirectories.
+If the path list contains only files `cat` will read just the specific files.
The second parameter, *maxLines*, tells `cat` how many lines to read in total. If
*maxLines* is not provided, `cat` will read all lines from each file it crawls.
+The second parameter, `maxLines`, tells `cat` how many lines to read in total.
+If `maxLines` is not provided, `cat` will read all lines from each file it crawls.
The `cat` function reads each line (up to maxLines) in the crawled files and for each line
emits a tuple with two fields:
+The `cat` function reads each line (up to `maxLines`) in the crawled files and for each line
emits a tuple with two fields:
* line: The text in the line.
* file: The relative path of the file under $SOLR_HOME.
+* `line`: The text in the line.
+* `file`: The relative path of the file under $SOLR_HOME.
Below is an example of `cat` on the iris.csv file with a maxLines of 5:
+Below is an example of `cat` on the iris.csv file with a `maxLines` of `5`:
[source,text]

@@ 154,9 +153,9 @@ The example below shows the output of the `parseCSV` function visualized
as a ta
image::images/mathexpressions/csvtable.png[]
Columns from the table can then be visualized using one of Apache Zeppelins
visualizations. The example below shows a scatter plot of the petal_length and petal_width
grouped by species.
+Columns from the table can then be visualized using one of Apache Zeppelin's
+visualizations. The example below shows a scatter plot of the `petal_length` and `petal_width`
+grouped by `species`.
image::images/mathexpressions/csv.png[]
@@ 178,12 +177,11 @@ image::images/mathexpressions/csvselect.png[]
== Loading
When the data is ready to load, the `update` function can be used to send the
data to a Solr Cloud collection for indexing. The `update` function adds documents to Solr
in batches
and returns a tuple for each batch with summary information about the batch and load.
+data to a SolrCloud collection for indexing.
+The `update` function adds documents to Solr in batches and returns a tuple for each batch
with summary information about the batch and load.
In the example below the update expression is run using ZeppelinSolr because the
data set is small. For larger loads its best to run the load from a curl command
where the output of the `update` function can be spooled to disk.
+In the example below the update expression is run using ZeppelinSolr because the data set
is small.
+For larger loads its best to run the load from a curl command where the output of the `update`
function can be spooled to disk.
image::images/mathexpressions/update.png[]
@@ 191,7 +189,7 @@ image::images/mathexpressions/update.png[]
Streaming Expressions and Math Expression provide a powerful set of functions
for transforming data. The section below shows some useful transformations that
can be applied while analyzing, visualizing and loading CSV and TSV files.
+can be applied while analyzing, visualizing, and loading CSV and TSV files.
=== Unique IDs
@@ 220,7 +218,7 @@ image::images/mathexpressions/selectuuid.png[]
The `recNum` function can be used inside of a `select` function to add a record number
to each tuple. The record number is useful for tracking location in the result set
and can be used for filtering strategies such as skipping, paging and striding described
in
the *filtering* section below.
+the <<Filtering Results>> section below.
The example below shows the syntax of the `recNum` function:
@@ 229,8 +227,8 @@ image::images/mathexpressions/recNum.png[]
=== Parsing Dates
The `dateTime` function can be used to parse dates into ISO 8601 format
needed for loading into a Solr date time field.
+The `dateTime` function can be used to parse dates into the ISO8601 format
+needed for loading into a Solr date field.
We can first inspect the format of the data time field in the CSV file:
@@ 261,11 +259,11 @@ When this expression is sent to the `/stream` handler it responds with:
}

Then we can use the dateTime function to format the datetime and
map it to a Solr datetime field.
+Then we can use the `dateTime` function to format the datetime and
+map it to a Solr date field.
The `dateTime` function takes three parameters. The field in the data
with the date string, a template to parse the date using a Java SimpleDateFormat template,
+with the date string, a template to parse the date using a Java https://docs.oracle.com/javase/9/docs/api/java/text/SimpleDateFormat.html[`SimpleDateFormat`
template],
and an optional time zone.
If the time zone is not present the time zone defaults to GMT time unless
@@ 303,7 +301,7 @@ When this expression is sent to the `/stream` handler it responds with:
=== String Manipulation
The `upper`, `lower`, `split`, `valueAt`, `trim` and `concat` functions can be used to manipulate
+The `upper`, `lower`, `split`, `valueAt`, `trim`, and `concat` functions can be used to manipulate
strings inside of the `select` function.
The example below shows the `upper` function used to upper case the *species*
@@ 316,7 +314,7 @@ a delimiter. This can be used to create multivalue fields from fields
with an internal delimiter.
The example below demonstrates this with a direct call to
the /stream handler:
+the `/stream` handler:
[source,text]

@@ 371,10 +369,10 @@ image::images/mathexpressions/valueat.png[]
=== Filtering Results
The `having` function can be used to filter records. Filtering can be used to systematically
explore specific record sets before indexing or to filter records that are sent for indexing.
The `having` function wraps another stream and applies
a boolean function to each tuple. If the boolean logic function returns true the tuple is
returned.
+The `having` function can be used to filter records.
+Filtering can be used to systematically explore specific record sets before indexing or to
filter records that are sent for indexing.
+The `having` function wraps another stream and applies a boolean function to each tuple.
+If the boolean logic function returns true the tuple is returned.
The following boolean functions are supported: `eq`, `gt`, `gteq`, `lt`, `lteq`, `matches`,
`and`, `or`,
`not`, `notNull`, `isNull`.
@@ 422,15 +420,14 @@ image::images/mathexpressions/matches.png[]
In most cases nulls do not need to be handled directly unless there is specific logic needed
to handle nulls during the load.
The `select` function does not output fields that contain a null value. This means
as nulls are encountered in the data the fields are not included in the tuples.
+The `select` function does not output fields that contain a null value.
+This means as nulls are encountered in the data the fields are not included in the tuples.
The string manipulation functions all return null if they encounter a null. This means
the null will be passed through to the `select` function and the fields with nulls
will simply be left off the record.
+The string manipulation functions all return null if they encounter a null.
+This means the null will be passed through to the `select` function and the fields with nulls
will simply be left off the record.
In certain scenarios it can be important to directly filter or replace nulls. The sections
below cover these
scenarios.
+In certain scenarios it can be important to directly filter or replace nulls.
+The sections below cover these scenarios.
==== Filtering Nulls
@@ 454,17 +451,17 @@ The `if` function and `isNull`, `notNull` functions can be combined
to replace n
In the example below the `if` function applies the `isNull` boolean expression to two different
fields.
In the first example it replaces null *petal_width* values with 0, and returns the *petal_width*
if present.
In the second example it replace null *field1* values with the string literal "NA" and returns
*field1* if present.
+In the second example it replaces null *field1* values with the string literal "NA" and returns
*field1* if present.
image::images/mathexpressions/ifIsNull.png[]
=== Text Analysis
The `analyze` function can be used from inside a `select` function to analyze
a text field with a Lucene/Solr analyzer. The output of `analyze` is a
list of analyzed tokens which can be added to each tuple as a multivalued field.
+a text field with a Lucene/Solr analyzer.
+The output of `analyze` is a list of analyzed tokens which can be added to each tuple as
a multivalued field.
The multivalue field can then be sent to Solr for indexing or the `cartesianProduct`
+The multivalued field can then be sent to Solr for indexing or the `cartesianProduct`
function can be used to expand the list of tokens to a stream of tuples.
There are a number of interesting use cases for the `analyze` function:
diff git a/solr/solrrefguide/src/mathexpressions.adoc b/solr/solrrefguide/src/mathexpressions.adoc
index 343696e..9ab7e56 100644
 a/solr/solrrefguide/src/mathexpressions.adoc
+++ b/solr/solrrefguide/src/mathexpressions.adoc
@@ 1,5 +1,6 @@
= Streaming Expressions and Math Expressions
:pagechildren: visualization, mathstart, loading, searchsample, transform, scalarmath,
vectormath, variables, matrixmath, termvectors, statistics, probabilitydistributions,
simulations, timeseries, regression, numericalanalysis, curvefitting, dsp, machinelearning,
computationalgeometry, logs
+:pageshowtoc: false
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
diff git a/solr/solrrefguide/src/mathstart.adoc b/solr/solrrefguide/src/mathstart.adoc
index 0124923..ca79113 100644
 a/solr/solrrefguide/src/mathstart.adoc
+++ b/solr/solrrefguide/src/mathstart.adoc
@@ 16,11 +16,10 @@
// specific language governing permissions and limitations
// under the License.

== Language
*Streaming Expressions* and *Math Expressions* are function languages that run
inside Solr Cloud. The languages consist of functions
+inside SolrCloud. The languages consist of functions
that are designed to be *composed* to form programming logic.
*Streaming Expressions* are functions that return streams of tuples. Streaming Expression
functions can be
@@ 33,42 +32,39 @@ arrays and matrices. The core use case for Math Expressions is performing
mathem
visualization.
Streaming Expressions and Math Expressions can be combined to *search,
sample, aggregate, transform, analyze* and *visualize* data in Solr Cloud collections.
+sample, aggregate, transform, analyze* and *visualize* data in SolrCloud collections.
== Execution
Solr's /stream handler executes Streaming Expressions and Math Expressions.
The /stream handler compiles the expression, runs the expression logic
+Solr's `/stream` request handler executes Streaming Expressions and Math Expressions.
+This handler compiles the expression, runs the expression logic
and returns a JSON result.
=== Admin Stream Panel
+=== Admin UI Stream Panel
The easiest way to run Streaming Expressions and Math expressions is through
the *stream* panel on the Solr admin
+the *stream* panel on the Solr Admin
UI.
A sample *search* Streaming Expression is shown in the screenshot below:

image::images/mathexpressions/search.png[]

A sample *add* Math Expression is shown in the screenshot below:

image::images/mathexpressions/add.png[]
=== Curl Example
The http interface to the /stream handler can be used to
send an streaming expression request and retrieve the response.
+The HTTP interface to the `/stream` handler can be used to
+send a streaming expression request and retrieve the response.
Curl is a useful tool for running streaming expressions when the result
needs to be spooled to disk or is too large for the Solr admin stream panel. Below
is an example of a curl command to the /stream handler.
+is an example of a curl command to the `/stream` handler.
[source,text]
+[source,bash]

curl dataurlencode 'expr=search(enron_emails,
q="from:1800flowers*",
@@ 79,7 +75,7 @@ curl dataurlencode 'expr=search(enron_emails,
The JSON response from the stream handler for this request is shown below:
[source,text]
+[source,json]

{"resultset":{"docs":[
{"from":"1800flowers.133139412@s2u2.com","to":"lcampbel@enron.com"},
@@ 105,12 +101,10 @@ The JSON response from the stream handler for this request is shown
below:
The visualizations in this guide were performed with Apache Zeppelin using the
ZeppelinSolr interpreter.
=== ZeppelinSolr
+=== ZeppelinSolr Interpreter
The ZeppelinSolr interpreter allows Streaming Expressions and Math Expressions
to be executed and results visualized in Apache Zeppelin. The instructions for
 installing and configuring ZeppelinSolr can be found on the Github repository for
 the project:
+A Zeppelin interpreter for Solr allows Streaming Expressions and Math Expressions to be executed
and results visualized in Apache Zeppelin.
+The instructions for installing and configuring ZeppelinSolr can be found on the Github
repository for the project:
https://github.com/lucidworks/zeppelinsolr
Once installed the Solr Interpreter can be configured to connect to your Solr instance.
@@ 118,18 +112,18 @@ The screenshot below shows the panel for configuring ZeppelinSolr.
image::images/mathexpressions/zepconf.png[]
Configure the solr.baseUrl and solr.collection to point to the location where the Streaming
Expressions and Math Expressions will be sent for execution. The solr.collection is
+Configure the `solr.baseUrl` and `solr.collection` to point to the location where the Streaming
+Expressions and Math Expressions will be sent for execution. The `solr.collection` is
just the execution collection and does not need to hold data, although it can hold data.
Streaming Expressions can choose to query any of the collections that are attached
to the same Solr Cloud as the execution collection.
+to the same SolrCloud as the execution collection.
=== zplot
Streaming Expression result sets can be visualized automatically by ZeppelinSolr.
Math Expression results need to be formatted for visualization using the `zplot` function.
The `zplot` function has support for plotting *vectors*, *matrices*, *probability distributions*
and
+This function has support for plotting *vectors*, *matrices*, *probability distributions*
and
*2D clustering results*.
There are many examples in the guide which show how to visualize both Streaming Expressions
diff git a/solr/solrrefguide/src/probabilitydistributions.adoc b/solr/solrrefguide/src/probabilitydistributions.adoc
index 09b757b..472d043 100644
 a/solr/solrrefguide/src/probabilitydistributions.adoc
+++ b/solr/solrrefguide/src/probabilitydistributions.adoc
@@ 38,7 +38,7 @@ The `empiricalDistribution` function creates a continuous probability
distribution from actual data.
Empirical distributions can be used to conveniently visualize the probability density
function of a random sample from a Solr Cloud
+function of a random sample from a SolrCloud
collection. The example below shows the zplot function visualizing the probability
density of a random sample with a 32 bin histogram.
diff git a/solr/solrrefguide/src/searchsample.adoc b/solr/solrrefguide/src/searchsample.adoc
index 76ad251..8bf5a1e 100644
 a/solr/solrrefguide/src/searchsample.adoc
+++ b/solr/solrrefguide/src/searchsample.adoc
@@ 25,35 +25,32 @@ and aggregation.
=== Exploring
The *search* function can be used to search a Solr Cloud collection and return a
+The `search` function can be used to search a SolrCloud collection and return a
result set.
Below is an example of the most basic *search* function called from the ZeppelinSolr interpreter.
ZeppelinSolr sends the *search(logs)* call to the /stream handler and displays the results
+Below is an example of the most basic `search` function called from the ZeppelinSolr interpreter.
+ZeppelinSolr sends the `seach(logs)` call to the `/stream` handler and displays the results
in *table* format.

In the example the search function is passed only the name of the collection being searched.
This returns
a result set of 10 records with all fields. This simple function is useful
for exploring the fields in the data and understanding how to start refining the search criteria.
+In the example the `search` function is passed only the name of the collection being searched.
+This returns a result set of 10 records with all fields.
+This simple function is useful for exploring the fields in the data and understanding how
to start refining the search criteria.
image::images/mathexpressions/search1.png[]
=== Searching and Sorting
Once the format of the records is known, parameters can be added to the *search* function
to begin analyzing
the data.
+Once the format of the records is known, parameters can be added to the `*search*` function
to begin analyzing the data.
In the example below a search query, field list, rows and sort have been added to the search
function. Now the search is limited to records within a specific time range and returns
a max result set of 750 records sorted by tdate_dt ascending. We have also limited the result
set to three specific
fields.
+In the example below a search query, field list, rows and sort have been added to the `search`
function.
+Now the search is limited to records within a specific time range and returns
+a maximum result set of 750 records sorted by `tdate_dt` ascending.
+We have also limited the result set to three specific fields.
image::images/mathexpressions/searchsort.png[]

Once the data is loaded into the table we can switch to a scatter plot and plot the *filesize_d*
column
on the *x axis* and the *response_d* column on the *y axis*.
+Once the data is loaded into the table we can switch to a scatter plot and plot the `filesize_d`
column
+on the *x axis* and the `response_d` column on the *y axis*.
image::images/mathexpressions/searchsortplot.png[]
@@ 70,40 +67,33 @@ image::images/mathexpressions/scoring.png[]
== Sampling
The `random` function returns a random sample from a distributed search result set.
This allows for fast visualization, statistical analysis and modeling of
+This allows for fast visualization, statistical analysis, and modeling of
samples that can be used to infer information about the larger result set.
The visualization examples below use small random samples, but
Solr's random sampling provides subsecond
response times on sample sizes of over 200,000. These larger samples can be used to build
reliable statistical models that describe large data sets (billions of
documents) with subsecond performance.
+The visualization examples below use small random samples, but Solr's random sampling provides
subsecond response times on sample sizes of over 200,000.
+These larger samples can be used to build reliable statistical models that describe large
data sets (billions of documents) with subsecond performance.
The examples below demonstrate univariate and bivariate scatter
plots of random samples. Statistical modeling with random samples
is covered in the Statistics, Probability, Linear Regression, Curve Fitting
and Machine Learning sections of the user guide.
+plots of random samples.
+Statistical modeling with random samples
+is covered in the <<statistics.adoc,Statistics>>, <<probabilitydistributions.adoc,Probability>>,
<<regression.adoc,Linear Regression>>, <<curvefitting.adoc,Curve Fitting>>,
+and <<machinelearning.adoc,Machine Learning>> sections.
=== Univariate Scatter Plots
In the example below the `random` function is called in its simplest form with just a collection
name as the parameter.
+When called with no other parameters the `random` function returns a random sample of 500
records with all fields from the collection.
+When called without the field list parameter (`fl`) the `random` function also generates
a sequence, 0499 in this case, which can be used for plotting the `x` axis.
+This sequence is returned in a field called `x`.
When called with no other parameters the `random` function returns a random sample
of 500 records with all fields from
the collection. When called without the *field list* parameter the `random` function also
generates
a sequence, 0499 in this case, which can be used
for plotting the `x` axis. This sequence is
returned in a field called `x`.

The visualization below shows a scatter plot with the *filesize_d* field
plotted on the `y` axis and the `x` sequence
plotted on the `x` axis. The effect of this is to spread the
*filesize_d* samples across the length
+The visualization below shows a scatter plot with the `filesize_d` field
+plotted on the `y` axis and the `x` sequence plotted on the `x` axis.
+The effect of this is to spread the `filesize_d` samples across the length
of the plot so they can be more easily studied.
By studying the scatter plot we can learn a number of things about the
distribution of the *filesize_d* variable:
+distribution of the `filesize_d` variable:
* The sample set ranges from 34,875 to 45,902.
* The highest density appears to be at about 40,000.
@@ 112,71 +102,63 @@ distribution of the *filesize_d* variable:
* The number of observations tapers off to a small number of outliers on
the and low and high end of the sample.
This sample can be rerun multiple times to see if the samples
+This sample can be rerun multiple times to see if the samples
produce similar plots.
image::images/mathexpressions/univariate.png[]
=== Bivariate Scatter Plots
In the next example parameters have been added to the `random` function. The field list (*fl*)
now specifies two fields to be
returned with each sample: *filesize_d* and *response_d*. The `q` and `rows` parameters are
the same
as the defaults but are included as an example of how to set these parameters.
+In the next example parameters have been added to the `random` function.
+The field list (`fl`) now specifies two fields to be
+returned with each sample: `filesize_d` and `response_d`.
+The `q` and `rows` parameters are the same as the defaults but are included as an example
of how to set these parameters.
By plotting *filesize_d* on the *x* axis and *response_d* on the y axis we can begin to study
the relationship between the two variables.
+By plotting `filesize_d` on the *x* axis and `response_d` on the *y* axis we can begin to
study the relationship between the two variables.
By studying the scatter plot we can learn the following:
* As *filesize_d* rises *response_d* tends to rise.
* This relationship appears to be linear, as a straight line put through the data could
be used to model the relationship.
* The points appear to cluster more densely along a straight line through the middle
and become less dense as they move away from the line.
* The variance of the data at each *filesize_d* point seems fairly consistent. This means
a predictive model would have consistent error across the range of predictions.
+* As `filesize_d` rises, `response_d` tends to rise.
+* This relationship appears to be linear, as a straight line put through the data could be
used to model the relationship.
+* The points appear to cluster more densely along a straight line through the middle and
become less dense as they move away from the line.
+* The variance of the data at each `filesize_d` point seems fairly consistent. This means
a predictive model would have consistent error across the range of predictions.
image::images/mathexpressions/bivariate.png[]
== Aggregation
Aggregations are a powerful statistical tool for summarizing large data sets and
surfacing patterns, trends and correlations within the data. Aggregations are also a powerful
tool for visualization and provide data sets for further statistical analysis.
+surfacing patterns, trends, and correlations within the data.
+Aggregations are also a powerful tool for visualization and provide data sets for further
statistical analysis.
=== stats
The simplest aggregation is the `stats` function. The `stats` function calculates
aggregations for an entire result set that matches a query. The `stats` function supports
the following aggregation functions: count(*), sum, min, max and avg. Any number
and combination of statistics can be calculated in a single function call.

+The simplest aggregation is the `stats` function.
+The `stats` function calculates aggregations for an entire result set that matches a query.
+The `stats` function supports the following aggregation functions: `count(*)`, `sum`, `min`,
`max`, and `avg`.
+Any number and combination of statistics can be calculated in a single function call.
The `stats` function can be visualized
in ZeppelinSolr as a table. In the example below two statistics are calculated
over a result set and are displayed in a table:
+The `stats` function can be visualized in ZeppelinSolr as a table.
+In the example below two statistics are calculated over a result set and are displayed in
a table:
image::images/mathexpressions/statstable.png[]
The `stats` function can also be visualized using the *number* visualization which is
used to highlight important numbers. The example below shows the `count(*)` aggregation
displayed in the number visualization:
+The `stats` function can also be visualized using the *number* visualization which is used
to highlight important numbers.
+The example below shows the `count(*)` aggregation displayed in the number visualization:
image::images/mathexpressions/stats.png[]

=== facet
The `facet` function performs single and multidimension
aggregations that behave in a similar manner to SQL group by aggregations.
Under the covers the `facet` function pushes down the aggregations to Solr's
JSON facet api for fast distributed execution.
+<<jsonfacetapi.adoc,JSON Facet API>> for fast distributed execution.
The example below performs a single dimension aggregation from the
nyc311 (NYC complaints) collection. The aggregation returns the top five
*complaint types* by *count* for records with a status of *Pending*. The results is displayed
with ZeppelinSolr in a table.
+nyc311 (NYC complaints) dataset.
+The aggregation returns the top five *complaint types* by *count* for records with a status
of *Pending*.
+The results are displayed with ZeppelinSolr in a table.
image::images/mathexpressions/facettab1.png[]
@@ 184,10 +166,9 @@ The example below shows the table visualized using a pie chart.
image::images/mathexpressions/facetviz1.png[]
The next example demonstrates a multidimension aggregation. Notice that
the *buckets* parameter now
contains two dimensions: *borough_s* and *complaint_type_s*. This returns the top 20
combinations of borough and complaint type by count.
+The next example demonstrates a multidimension aggregation.
+Notice that the `buckets` parameter now contains two dimensions: `borough_s` and `complaint_type_s`.
+This returns the top 20 combinations of borough and complaint type by count.
image::images/mathexpressions/facettab2.png[]
@@ 206,7 +187,7 @@ visualized as heat maps or pivoted into matrices and operated on by machine
lear
`facet2D` has different syntax and behavior then a two dimensional `facet` function which
does not control the number of unique facets of each dimension. The `facet2D` function
has the *dimensions* parameter which controls the number of unique facets
+has the `dimensions` parameter which controls the number of unique facets
for the *x* and *y* dimensions.
The example below visualizes the output of the `facet2D` function. In the example `facet2D`
@@ 215,8 +196,7 @@ then visualized as a heatmap.
image::images/mathexpressions/facet2D.png[]
The `facet2D` function supports one of the following aggregate functions: count(*), sum,
avg, min, max.
+The `facet2D` function supports one of the following aggregate functions: `count(*)`, `sum`,
`avg`, `min`, `max`.
=== timeseries
@@ 231,40 +211,33 @@ The output of the `timeseries` function is then visualized with a line
chart.
image::images/mathexpressions/timeseries1.png[]
The `timeseries` function supports any combination of the following aggregate functions:
count(*), sum, avg, min,
max.
+The `timeseries` function supports any combination of the following aggregate functions:
`count(*)`, `sum`, `avg`, `min`, `max`.
=== significantTerms
The `significantTerms` function queries a collection,
but instead of returning documents, it returns significant terms found in
documents in the result set. The `significantTerms` function scores terms
based on how frequently they appear in the result set and how rarely
they appear in the entire corpus. The `significantTerms` function emits a
tuple for each term which contains the term, the score,
the foreground count and the background count. The foreground count is
how many documents the term appears in in the result set.
+The `significantTerms` function queries a collection, but instead of returning documents,
it returns significant terms found in documents in the result set.
+This function scores terms based on how frequently they appear in the result set and how
rarely they appear in the entire corpus.
+The `significantTerms` function emits a tuple for each term which contains the term, the
score, the foreground count and the background count.
+The foreground count is how many documents the term appears in in the result set.
The background count is how many documents the term appears in in the entire corpus.
The foreground and background counts are global for the collection.
The `significantTerms` function can often provide insights that cannot be gleaned from
other types of aggregations. The example below illustrates the difference between
 the `facet` function and the `significantTerms` function.
+The `significantTerms` function can often provide insights that cannot be gleaned from other
types of aggregations.
+The example below illustrates the difference between the `facet` function and the `significantTerms`
function.
In the first example the `facet` function aggregates the top 5 complaint types
in Brooklyn. This returns the five most common complaint types in Brooklyn, but
+in Brooklyn.
+This returns the five most common complaint types in Brooklyn, but
its not clear that these terms appear more frequently in Brooklyn then
then the other boroughs.
image::images/mathexpressions/significantTermsCompare.png[]
In the next example the `significantTerms` function returns the top 5 significant terms
in the *complaint_type_s* field for the borough of Brooklyn. The highest scoring term,
Elder Abuse, has a foreground count of 285 and background count of 298. This means
that there were 298 Elder Abuse complaints in the entire data set, and 285 of them
where in Brooklyn. This shows that Elder Abuse complaints have a much higher occurrence
rate in Brooklyn than the other boroughs.
+In the next example the `significantTerms` function returns the top 5 significant terms in
the `complaint_type_s` field for the borough of Brooklyn.
+The highest scoring term, Elder Abuse, has a foreground count of 285 and background count
of 298.
+This means that there were 298 Elder Abuse complaints in the entire data set, and 285 of
them were in Brooklyn.
+This shows that Elder Abuse complaints have a much higher occurrence rate in Brooklyn than
the other boroughs.
image::images/mathexpressions/significantTerms2.png[]
@@ 281,41 +254,35 @@ image::images/mathexpressions/sterms.png[]
=== nodes
The `nodes` function performs aggregations of nodes during a breadth first search of a graph.
The `nodes` function is covered in detail in the <<graphtraversal.adoc#graphtraversal,Graph
Traversal>>
documentation. In this example the focus will be on finding correlated nodes in a time series
+This function is covered in detail in the section <<graphtraversal.adoc#graphtraversal,Graph
Traversal>>.
+In this example the focus will be on finding correlated nodes in a time series
graph using the `nodes` expressions.
The example below finds stock tickers whose daily movements tend to be correlated with the
ticker *jpm* (JP Morgan).
+The example below finds stock tickers whose daily movements tend to be correlated with the
ticker *jpm* (JP Morgan).
The inner `search` expression finds records between a specific date range
where the ticker symbol is *jpm* and the *change_d* field (daily change in stock price)
is greater then .25. This search returns all fields in the index including
the *yearMonthDay_s* which is the string representation of the year, month and day
of the matching records.

The `nodes` function wraps the `search` function and operates over its results. The `walk`
parameter maps
a field from the search results to a field in the index. In this case the *yearMonthDay_s*
is mapped back
to the *yearMonthDay_s* field in the same index. This will find records that have same
yearMonthDay_s field value returned
+where the ticker symbol is *jpm* and the `change_d` field (daily change in stock price) is
greater then .25.
+This search returns all fields in the index including the `yearMonthDay_s` which is the string
representation of the year, month, and day of the matching records.
+
+The `nodes` function wraps the `search` function and operates over its results. The `walk`
parameter maps a field from the search results to a field in the index.
+In this case the `yearMonthDay_s` is mapped back to the `yearMonthDay_s` field in the same
index.
+This will find records that have same `yearMonthDay_s` field value returned
by the initial search, and will return records for all tickers on those days.
A filter query is applied to the search to filter the search to rows that have a *change_d*
greater the .25. This will find all records on the matching days, that have a
daily change greater than .25.
+A filter query is applied to the search to filter the search to rows that have a `change_d`
+greater the .25.
+This will find all records on the matching days that have a daily change greater then .25.
The *gather* parameter tells the nodes expression to gather the *ticker_s* symbols during
the
breadth first search. The `count(*)` parameter counts the occurrences of the tickers.
+The `gather` parameter tells the nodes expression to gather the `ticker_s` symbols during
the breadth first search.
+The `count(*)` parameter counts the occurrences of the tickers.
This will count the number of times each ticker appears in the breadth first search.
Finally the `top` function selects the top 5 tickers by count and returns them.
The result below shows the tickers in the *nodes* field and the counts for each node. Notice
*jpm* is first, which shows how many days *jpm* had a change greater then .25 in this time
period. The next set of tickers (*mtb*, *slvb*, *gs* and *pnc*) are the tickers with highest
number of days with a change greater then .25 on the same days that *jpm* had a change greater
then .25.
+The result below shows the ticker symbols in the `nodes` field and the counts for each node.
+Notice *jpm* is first, which shows how many days *jpm* had a change greater then .25 in this
time
+period.
+The next set of ticker symbols (*mtb*, *slvb*, *gs* and *pnc*) are the symbols with highest
number of days with a change greater then .25 on the same days that *jpm* had a change greater
then .25.
image::images/mathexpressions/nodestab.png[]
The `nodes` function supports any combination of the following aggregate functions: count(*),
sum, avg, min,
max.
+The `nodes` function supports any combination of the following aggregate functions: `count(*)`,
`sum`, `avg`, `min`, `max`.
diff git a/solr/solrrefguide/src/streamingexpressions.adoc b/solr/solrrefguide/src/streamingexpressions.adoc
index a0524b0..af448b2 100644
 a/solr/solrrefguide/src/streamingexpressions.adoc
+++ b/solr/solrrefguide/src/streamingexpressions.adoc
@@ 17,8 +17,8 @@
// specific language governing permissions and limitations
// under the License.
Streaming Expressions exposes the capabilities of Solr Cloud as composable functions. These
functions provide a system for
searching, transforming, analyzing and visualizing data stored in Solr Cloud collections.
+Streaming Expressions exposes the capabilities of SolrCloud as composable functions. These
functions provide a system for
+searching, transforming, analyzing and visualizing data stored in SolrCloud collections.
At a high level there a four main capabilities that will be explored in the documentation:
