Repository: lucenesolr
Updated Branches:
refs/heads/master ff1df8a15 > 531b16633
SOLR12913: Add new facet expression and pivot docs
Project: http://gitwipus.apache.org/repos/asf/lucenesolr/repo
Commit: http://gitwipus.apache.org/repos/asf/lucenesolr/commit/531b1663
Tree: http://gitwipus.apache.org/repos/asf/lucenesolr/tree/531b1663
Diff: http://gitwipus.apache.org/repos/asf/lucenesolr/diff/531b1663
Branch: refs/heads/master
Commit: 531b16633acc8c398a20ca2a52b7ded3901702e6
Parents: ff1df8a
Author: Joel Bernstein <jbernste@apache.org>
Authored: Wed Nov 7 15:07:21 2018 0500
Committer: Joel Bernstein <jbernste@apache.org>
Committed: Wed Nov 7 15:07:46 2018 0500

.../src/streamsourcereference.adoc  14 +++
solr/solrrefguide/src/vectorization.adoc  80 ++++++++++++++++++++
2 files changed, 90 insertions(+), 4 deletions()

http://gitwipus.apache.org/repos/asf/lucenesolr/blob/531b1663/solr/solrrefguide/src/streamsourcereference.adoc

diff git a/solr/solrrefguide/src/streamsourcereference.adoc b/solr/solrrefguide/src/streamsourcereference.adoc
index c31639a..c83991e 100644
 a/solr/solrrefguide/src/streamsourcereference.adoc
+++ b/solr/solrrefguide/src/streamsourcereference.adoc
@@ 130,8 +130,12 @@ The `facet` function provides aggregations that are rolled up over buckets.
Unde
* `collection`: (Mandatory) Collection the facets will be aggregated from.
* `q`: (Mandatory) The query to build the aggregations from.
* `buckets`: (Mandatory) Comma separated list of fields to rollup over. The comma separated
list represents the dimensions in a multidimensional rollup.
* `bucketSorts`: Comma separated list of sorts to apply to each dimension in the buckets
parameters. Sorts can be on the computed metrics or on the bucket values.
* `bucketSizeLimit`: The number of buckets to include. This value is applied to each dimension.
'1' will fetch all the buckets.
+* `bucketSorts`: (Mandatory) Comma separated list of sorts to apply to each dimension in
the buckets parameters. Sorts can be on the computed metrics or on the bucket values.
+* `rows`: (Default 10) The number of rows to return. '1' will return all rows.
+* `offset`:(Default 0) The offset in the result set to start from.
+* `overfetch`: (Default 150) Overfetching is used to provide accurate aggregations over
high cardinality fields.
+* `method`: The JSON facet API aggregation method.
+* `bucketSizeLimit`: Sets the absolute number of rows to fetch. This is incompatible with
rows, offset and overfetch. This value is applied to each dimension. '1' will fetch all the
buckets.
* `metrics`: List of metrics to compute for the buckets. Currently supported metrics are
`sum(col)`, `avg(col)`, `min(col)`, `max(col)`, `count(*)`.
=== facet Syntax
@@ 144,7 +148,7 @@ facet(collection1,
q="*:*",
buckets="a_s",
bucketSorts="sum(a_i) desc",
 bucketSizeLimit=100,
+ rows=100,
sum(a_i),
sum(a_f),
min(a_i),
@@ 166,7 +170,8 @@ facet(collection1,
q="*:*",
buckets="year_i, month_i, day_i",
bucketSorts="year_i desc, month_i desc, day_i desc",
 bucketSizeLimit=100,
+ rows=10,
+ offset=20,
sum(a_i),
sum(a_f),
min(a_i),
@@ 179,6 +184,7 @@ facet(collection1,

The example above shows a facet function with rollups over three buckets, where the buckets
are returned in descending order by bucket value.
+The rows param returns 10 rows and the offset param starts returning rows from the 20th row.
== features
http://gitwipus.apache.org/repos/asf/lucenesolr/blob/531b1663/solr/solrrefguide/src/vectorization.adoc

diff git a/solr/solrrefguide/src/vectorization.adoc b/solr/solrrefguide/src/vectorization.adoc
index 5fdfadc..acd56ec 100644
 a/solr/solrrefguide/src/vectorization.adoc
+++ b/solr/solrrefguide/src/vectorization.adoc
@@ 31,6 +31,12 @@ to vectorize and analyze the results sets.
Below are some of the key stream sources:
+* *`facet`*: Multidimensional aggregations are a powerful tool for generating
+cooccurrence counts for categorical data. The `facet` function uses the JSON facet API
+under the covers to provide fast, distributed, multidimension aggregations. With math expressions
+the aggregated results can be pivoted into a cooccurance matrix which can be mined for
+correlations and hidden similarities within the data.
+
* *`random`*: Random sampling is widely used in statistics, probability and machine learning.
The `random` function returns a random sample of search results that match a
query. The random samples can be vectorized and operated on by math expressions and the results
@@ 242,6 +248,80 @@ When this expression is sent to the `/stream` handler it responds with:
}

+== Facet CoOccurrence Matrices
+
+The `facet` function can be used to quickly perform mulitdimension aggregations of categorical
data from
+records stored in a Solr Cloud collection. These multidimension aggregations can represent
cooccurrence
+counts for the values in the dimensions. The `pivot` function can be used to move two dimensional
+aggregations into a cooccurrence matrix. The cooccurrence matrix can then be clustered
or analyzed for
+correlations to learn about the hidden connections within the data.
+
+In the example below th `facet` expression is used to generate a two dimensional faceted
aggregation.
+The first dimension is the US State that a car was purchased in and the second dimension
is the car model.
+The two dimensional facet generates the cooccurrence counts for the number of times a particular
car model
+was purchased in a particular state.
+
+
+[source,text]
+
+facet(collection1, q="*:*", buckets="state, model", bucketSorts="count(*) desc", rows=5,
count(*))
+
+
+When this expression is sent to the `/stream` handler it responds with:
+
+[source,json]
+
+{
+ "resultset": {
+ "docs": [
+ {
+ "state": "NY",
+ "model": "camry",
+ "count(*)": 13342
+ },
+ {
+ "state": "NJ",
+ "model": "accord",
+ "count(*)": 13002
+ },
+ {
+ "state": "NY",
+ "model": "civic",
+ "count(*)": 12901
+ },
+ {
+ "state": "CA",
+ "model": "focus",
+ "count(*)": 12892
+ },
+ {
+ "state": "TX",
+ "model": "f150",
+ "count(*)": 12871
+ },
+ {
+ "EOF": true,
+ "RESPONSE_TIME": 171
+ }
+ ]
+ }
+}
+
+
+The `pivot` function can be used to move the facet results into a cooccurrence matrix. In
the example below
+The `pivot` function is used to create a matrix where the rows of the matrix are the US States
(state) and the
+columns of the matrix are the car models (model). The values in the matrix are the cooccurrence
counts (count(*))
+ from facet results. Once the cooccurrence matrix has been created the US States can be
clustered
+by car model, or the matrix can be transposed and car models can be clustered by the US States
+where they were bought.
+
+[source,text]
+
+let(a=facet(collection1, q="*:*", buckets="state, model", bucketSorts="count(*) desc", rows="1",
count(*)),
+ b=pivot(a, state, model, count(*)),
+ c=kmeans(b, 7))
+
+
== Latitude / Longitude Vectors
The `latlonVectors` function wraps a list of tuples and parses a lat/lon location field into
