lucene-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jbern...@apache.org
Subject [lucene-solr] branch SOLR-13105-visual updated: SOLR-13105: Add text to loading page
Date Sun, 04 Aug 2019 18:17:21 GMT
This is an automated email from the ASF dual-hosted git repository.

jbernste pushed a commit to branch SOLR-13105-visual
in repository https://gitbox.apache.org/repos/asf/lucene-solr.git


The following commit(s) were added to refs/heads/SOLR-13105-visual by this push:
     new 472457d  SOLR-13105: Add text to loading page
472457d is described below

commit 472457d928061ea6faf90400e3c340827ad85e86
Author: Joel Bernstein <jbernste@apache.org>
AuthorDate: Sun Aug 4 14:16:57 2019 -0400

    SOLR-13105: Add text to loading page
---
 solr/solr-ref-guide/src/loading.adoc          | 49 +++++++++++++++++++++++++++
 solr/solr-ref-guide/src/math-expressions.adoc |  2 +-
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/solr/solr-ref-guide/src/loading.adoc b/solr/solr-ref-guide/src/loading.adoc
index 1db27fd..7009aba 100644
--- a/solr/solr-ref-guide/src/loading.adoc
+++ b/solr/solr-ref-guide/src/loading.adoc
@@ -17,14 +17,63 @@
 // under the License.
 
 
+Streaming Expressions allows CSV and TSV formatted data to be visualized and transformed
+before loading it into Solr Cloud collections. A number of useful functions are provided
+for parsing dates, creating unique ids, cleaning data, analyzing text and visualizing
+data all before its loaded into Solr Cloud collections.
+
 == Reading Files
 
+The `cat` function can be used to read files under the "userfiles" directory in
+SOLR_HOME. The `cat` function takes two parameters. The first parameter is a comma
+delimited list of paths. If the path list contain directories, `cat` will crawl
+all the files in the directory and sub-directories. If the path list contains only
+files `cat` will operate crawl just the specific files.
+
+The second parameter, *maxLines*, tells `cat` how many lines to read in total. If
+*maxLines* is not provided, `cat` will read all lines from each file it crawls.
+
+The `cat` function reads each line (up to maxLines) in files and for each line
+emits a tuple with two fields:
+
+* line: The text in the line.
+* file: The relative path of the file under SOLR_HOME.
+
+Below is an example of `cat`.
+
+
 == Parsing CSV and TSV Files
 
+The `parseCSV` and `parseTSV` functions wrap the `cat` function and parse CSV
+(comma separated values) and TSV (tab separated values). Both of these functions
+expect a CSV or TSV header record at the beginning of each file.
+
+Both `parseCSV` and `parseTSV` emit tuples with header values mapped to their
+corresponding values in each line.
+
+
 == Visualizing
 
 == Transforming Data
 
+=== Selecting fields
+
+=== Unique ID's
+
+Both functions also emit an id field if one is not present in the records already.
+The id field is a concatenation of the file path and the line number. This is a
+convenent way to ensure that records have consistent reproducible id's if the one
+is not present in file.
+
+
+=== Parsing Dates
+
+=== Handling Nulls
+
+=== String Manipulation
+
+=== Text Analysis
+
 == Loading Data
 
 
diff --git a/solr/solr-ref-guide/src/math-expressions.adoc b/solr/solr-ref-guide/src/math-expressions.adoc
index be266da..fab6921 100644
--- a/solr/solr-ref-guide/src/math-expressions.adoc
+++ b/solr/solr-ref-guide/src/math-expressions.adoc
@@ -55,7 +55,7 @@ image::images/math-expressions/curve-fitting.png[]
 
 *<<numerical-analysis.adoc#numerical-analysis,Interpolation, Derivatives and Integrals>>*:
Numerical analysis math expressions.
 
-*<<dsp.adoc#dsp,Digital Signal Processing>>*: Functions commonly used with digital
signal processing.
+*<<dsp.adoc#dsp,Signal Processing>>*: Functions commonly used with digital signal
processing.
 
 *<<curve-fitting.adoc#curve-fitting,Curve Fitting>>*: Polynomial, Harmonic and
Gaussian curve fitting.
 


Mime
View raw message