Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 6384D200CAB for ; Sun, 18 Jun 2017 17:54:51 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 628BD160BEE; Sun, 18 Jun 2017 15:54:51 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 82197160BCC for ; Sun, 18 Jun 2017 17:54:50 +0200 (CEST) Received: (qmail 99398 invoked by uid 500); 18 Jun 2017 15:54:49 -0000 Mailing-List: contact commits-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list commits@lucene.apache.org Received: (qmail 99389 invoked by uid 99); 18 Jun 2017 15:54:49 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 18 Jun 2017 15:54:49 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 519AAE2F58; Sun, 18 Jun 2017 15:54:47 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: dpgove@apache.org To: commits@lucene.apache.org Date: Sun, 18 Jun 2017 15:54:47 -0000 Message-Id: <5f2775c2fa134f80bacd0b7e95e0a67f@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [1/8] lucene-solr:master: Adds documentation for the cartesianProduct archived-at: Sun, 18 Jun 2017 15:54:51 -0000 Repository: lucene-solr Updated Branches: refs/heads/master 42fdb5492 -> 943bf5ab5 Adds documentation for the cartesianProduct Project: http://git-wip-us.apache.org/repos/asf/lucene-solr/repo Commit: http://git-wip-us.apache.org/repos/asf/lucene-solr/commit/fffbe67b Tree: http://git-wip-us.apache.org/repos/asf/lucene-solr/tree/fffbe67b Diff: http://git-wip-us.apache.org/repos/asf/lucene-solr/diff/fffbe67b Branch: refs/heads/master Commit: fffbe67b3b0f919f828df294916c071341f473f0 Parents: 42fdb54 Author: Dennis Gove Authored: Tue Jun 13 08:12:32 2017 -0400 Committer: Dennis Gove Committed: Sun Jun 18 11:50:46 2017 -0400 ---------------------------------------------------------------------- solr/solr-ref-guide/src/stream-decorators.adoc | 364 ++++++++++++++++++++ 1 file changed, 364 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/fffbe67b/solr/solr-ref-guide/src/stream-decorators.adoc ---------------------------------------------------------------------- diff --git a/solr/solr-ref-guide/src/stream-decorators.adoc b/solr/solr-ref-guide/src/stream-decorators.adoc index c4ab4f4..e65f18a 100644 --- a/solr/solr-ref-guide/src/stream-decorators.adoc +++ b/solr/solr-ref-guide/src/stream-decorators.adoc @@ -20,6 +20,370 @@ // specific language governing permissions and limitations // under the License. +== cartesianProduct + +The `cartesianProduct` function turns a single tuple with a multi-valued field (ie. an array) into multiple tuples, one for each value in the array field. That is, given a single tuple containing an array of N values for fieldA, the `cartesianProduct` function will output N tuples, each with one value from the original tuple's array. In essence, you can flatten arrays for further processing. + +For example, using `cartesianProduct` you can turn this tuple +[source,text] +---- +{ + "fieldA": "foo", + "fieldB": ["bar","baz","bat"] +} +---- + +into the following 3 tuples +[source,text] +---- +{ + "fieldA": "foo", + "fieldB": "bar" +} +{ + "fieldA": "foo", + "fieldB": "baz" +} +{ + "fieldA": "foo", + "fieldB": "bat" +} +---- + +=== cartesianProduct Parameters + +* `incoming stream`: (Mandatory) A single incoming stream. +* `fieldName or evaluator`: (Mandatory) Name of field to flatten values for, or evaluator whose result should be flattened. +* `productSort='fieldName ASC|DESC'`: (Optional) Sort order of the newly generated tuples. + +=== cartesianProduct Syntax + +[source,text] +---- +cartesianProduct( + , + [as newFieldName], + productSort='fieldName ASC|DESC' +) +---- + +=== cartesianProduct Examples + +The following examples show different outputs for this source tuple + +[source,text] +---- +{ + "fieldA": "valueA", + "fieldB": ["valueB1","valueB2"], + "fieldC": [1,2,3] +} +---- + +==== Single Field, No Sorting + +[source,text] +---- +cartesianProduct( + search(collection1, q='*:*', fl='fieldA, fieldB, fieldC', sort='fieldA ASC'), + fieldB +) + +{ + "fieldA": "valueA", + "fieldB": "valueB1", + "fieldC": [1,2,3] +} +{ + "fieldA": "valueA", + "fieldB": "valueB2", + "fieldC": [1,2,3] +} +---- + +==== Single Evaluator, No Sorting + +[source,text] +---- +cartesianProduct( + search(collection1, q='*:*', fl='fieldA, fieldB, fieldC', sort='fieldA ASC'), + sequence(3,4,5) as fieldE +) + +{ + "fieldA": "valueA", + "fieldB": ["valueB1","valueB2"], + "fieldC": [1,2,3], + "fieldE": 4 +} +{ + "fieldA": "valueA", + "fieldB": ["valueB1","valueB2"], + "fieldC": [1,2,3], + "fieldE": 9 +} +{ + "fieldA": "valueA", + "fieldB": ["valueB1","valueB2"], + "fieldC": [1,2,3], + "fieldE": 14 +} +---- + +==== Single Field, Sorted by Value + +[source,text] +---- +cartesianProduct( + search(collection1, q='*:*', fl='fieldA, fieldB, fieldC', sort='fieldA ASC'), + fieldB, + productSort="fieldB DESC" +) + +{ + "fieldA": "valueA", + "fieldB": "valueB2", + "fieldC": [1,2,3] +} +{ + "fieldA": "valueA", + "fieldB": "valueB1", + "fieldC": [1,2,3] +} +---- + +==== Single Evaluator, Sorted by Evaluator Values + +[source,text] +---- +cartesianProduct( + search(collection1, q='*:*', fl='fieldA, fieldB, fieldC', sort='fieldA ASC'), + sequence(3,4,5) as fieldE, + productSort='newFieldE DESC' +) + +{ + "fieldA": "valueA", + "fieldB": ["valueB1","valueB2"], + "fieldC": [1,2,3], + "fieldE": 14 +} +{ + "fieldA": "valueA", + "fieldB": ["valueB1","valueB2"], + "fieldC": [1,2,3], + "fieldE": 9 +} +{ + "fieldA": "valueA", + "fieldB": ["valueB1","valueB2"], + "fieldC": [1,2,3], + "fieldE": 4 +} +---- + +==== Renamed Single Field, Sorted by Value + +[source,text] +---- +cartesianProduct( + search(collection1, q='*:*', fl='fieldA, fieldB, fieldC', sort='fieldA ASC'), + fieldB as newFieldB, + productSort="fieldB DESC" +) + +{ + "fieldA": "valueA", + "fieldB": ["valueB1","valueB2"], + "fieldC": [1,2,3] + "newFieldB": "valueB2", +} +{ + "fieldA": "valueA", + "fieldB": ["valueB1","valueB2"], + "fieldC": [1,2,3] + "newFieldB": "valueB1", +} +---- + +==== Multiple Fields, No Sorting + +[source,text] +---- +cartesianProduct( + search(collection1, q='*:*', fl='fieldA, fieldB, fieldC', sort='fieldA ASC'), + fieldB, + fieldC +) + +{ + "fieldA": "valueA", + "fieldB": "valueB1", + "fieldC": 1 +} +{ + "fieldA": "valueA", + "fieldB": "valueB1", + "fieldC": 2 +} +{ + "fieldA": "valueA", + "fieldB": "valueB1", + "fieldC": 3 +} +{ + "fieldA": "valueA", + "fieldB": "valueB2", + "fieldC": 1 +} +{ + "fieldA": "valueA", + "fieldB": "valueB2", + "fieldC": 2 +} +{ + "fieldA": "valueA", + "fieldB": "valueB2", + "fieldC": 3 +} +---- + +==== Multiple Fields, Sorted by Single Field + +[source,text] +---- +cartesianProduct( + search(collection1, q='*:*', fl='fieldA, fieldB, fieldC', sort='fieldA ASC'), + fieldB, + fieldC, + productSort="fieldC ASC" +) + +{ + "fieldA": "valueA", + "fieldB": "valueB1", + "fieldC": 1 +} +{ + "fieldA": "valueA", + "fieldB": "valueB2", + "fieldC": 1 +} +{ + "fieldA": "valueA", + "fieldB": "valueB1", + "fieldC": 2 +} +{ + "fieldA": "valueA", + "fieldB": "valueB2", + "fieldC": 2 +} +{ + "fieldA": "valueA", + "fieldB": "valueB1", + "fieldC": 3 +} +{ + "fieldA": "valueA", + "fieldB": "valueB2", + "fieldC": 3 +} +---- + +==== Multiple Fields, Sorted by Multiple Fields + +[source,text] +---- +cartesianProduct( + search(collection1, q='*:*', fl='fieldA, fieldB, fieldC', sort='fieldA ASC'), + fieldB, + fieldC, + productSort="fieldC ASC, fieldB DESC" +) + +{ + "fieldA": "valueA", + "fieldB": "valueB2", + "fieldC": 1 +} +{ + "fieldA": "valueA", + "fieldB": "valueB1", + "fieldC": 1 +} +{ + "fieldA": "valueA", + "fieldB": "valueB2", + "fieldC": 2 +} +{ + "fieldA": "valueA", + "fieldB": "valueB1", + "fieldC": 2 +} +{ + "fieldA": "valueA", + "fieldB": "valueB2", + "fieldC": 3 +} +{ + "fieldA": "valueA", + "fieldB": "valueB1", + "fieldC": 3 +} +---- + +==== Field and Evaluator, No Sorting + +[source,text] +---- +cartesianProduct( + search(collection1, q='*:*', fl='fieldA, fieldB, fieldC', sort='fieldA ASC'), + sequence(3,4,5) as fieldE, + fieldB +) + +{ + "fieldA": "valueA", + "fieldB": valueB1, + "fieldC": [1,2,3], + "fieldE": 4 +} +{ + "fieldA": "valueA", + "fieldB": valueB2, + "fieldC": [1,2,3], + "fieldE": 4 +} +{ + "fieldA": "valueA", + "fieldB": valueB1, + "fieldC": [1,2,3], + "fieldE": 9 +} +{ + "fieldA": "valueA", + "fieldB": valueB2, + "fieldC": [1,2,3], + "fieldE": 9 +} +{ + "fieldA": "valueA", + "fieldB": valueB1, + "fieldC": [1,2,3], + "fieldE": 14 +} +{ + "fieldA": "valueA", + "fieldB": valueB2, + "fieldC": [1,2,3], + "fieldE": 14 +} +---- + +As you can see in the examples above, the `cartesianProduct` function does support flattening tuples across multiple fields and/or evaluators. + == classify The `classify` function classifies tuples using a logistic regression text classification model. It was designed specifically to work with models trained using the <>. The `classify` function uses the <> to retrieve a stored model and then scores a stream of tuples using the model. The tuples read by the classifier must contain a text field that can be used for classification. The classify function uses a Lucene analyzer to extract the features from the text so the model can be applied. By default the `classify` function looks for the analyzer using the name of text field in the tuple. If the Solr schema on the worker node does not contain this field, the analyzer can be looked up in another field by specifying the `analyzerField` parameter.