lucene-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject [lucene-solr] branch SOLR-13105-visual updated: SOLR-13105: Add text to loading page
Date Sun, 04 Aug 2019 18:17:21 GMT
This is an automated email from the ASF dual-hosted git repository.

jbernste pushed a commit to branch SOLR-13105-visual
in repository

The following commit(s) were added to refs/heads/SOLR-13105-visual by this push:
     new 472457d  SOLR-13105: Add text to loading page
472457d is described below

commit 472457d928061ea6faf90400e3c340827ad85e86
Author: Joel Bernstein <>
AuthorDate: Sun Aug 4 14:16:57 2019 -0400

    SOLR-13105: Add text to loading page
 solr/solr-ref-guide/src/loading.adoc          | 49 +++++++++++++++++++++++++++
 solr/solr-ref-guide/src/math-expressions.adoc |  2 +-
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/solr/solr-ref-guide/src/loading.adoc b/solr/solr-ref-guide/src/loading.adoc
index 1db27fd..7009aba 100644
--- a/solr/solr-ref-guide/src/loading.adoc
+++ b/solr/solr-ref-guide/src/loading.adoc
@@ -17,14 +17,63 @@
 // under the License.
+Streaming Expressions allows CSV and TSV formatted data to be visualized and transformed
+before loading it into Solr Cloud collections. A number of useful functions are provided
+for parsing dates, creating unique ids, cleaning data, analyzing text and visualizing
+data all before its loaded into Solr Cloud collections.
 == Reading Files
+The `cat` function can be used to read files under the "userfiles" directory in
+SOLR_HOME. The `cat` function takes two parameters. The first parameter is a comma
+delimited list of paths. If the path list contain directories, `cat` will crawl
+all the files in the directory and sub-directories. If the path list contains only
+files `cat` will operate crawl just the specific files.
+The second parameter, *maxLines*, tells `cat` how many lines to read in total. If
+*maxLines* is not provided, `cat` will read all lines from each file it crawls.
+The `cat` function reads each line (up to maxLines) in files and for each line
+emits a tuple with two fields:
+* line: The text in the line.
+* file: The relative path of the file under SOLR_HOME.
+Below is an example of `cat`.
 == Parsing CSV and TSV Files
+The `parseCSV` and `parseTSV` functions wrap the `cat` function and parse CSV
+(comma separated values) and TSV (tab separated values). Both of these functions
+expect a CSV or TSV header record at the beginning of each file.
+Both `parseCSV` and `parseTSV` emit tuples with header values mapped to their
+corresponding values in each line.
 == Visualizing
 == Transforming Data
+=== Selecting fields
+=== Unique ID's
+Both functions also emit an id field if one is not present in the records already.
+The id field is a concatenation of the file path and the line number. This is a
+convenent way to ensure that records have consistent reproducible id's if the one
+is not present in file.
+=== Parsing Dates
+=== Handling Nulls
+=== String Manipulation
+=== Text Analysis
 == Loading Data
diff --git a/solr/solr-ref-guide/src/math-expressions.adoc b/solr/solr-ref-guide/src/math-expressions.adoc
index be266da..fab6921 100644
--- a/solr/solr-ref-guide/src/math-expressions.adoc
+++ b/solr/solr-ref-guide/src/math-expressions.adoc
@@ -55,7 +55,7 @@ image::images/math-expressions/curve-fitting.png[]
 *<<numerical-analysis.adoc#numerical-analysis,Interpolation, Derivatives and Integrals>>*:
Numerical analysis math expressions.
-*<<dsp.adoc#dsp,Digital Signal Processing>>*: Functions commonly used with digital
signal processing.
+*<<dsp.adoc#dsp,Signal Processing>>*: Functions commonly used with digital signal
 *<<curve-fitting.adoc#curve-fitting,Curve Fitting>>*: Polynomial, Harmonic and
Gaussian curve fitting.

View raw message