spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From marmb...@apache.org
Subject spark git commit: [DOCS][SQL] Add a Note on jsonFile having separate JSON objects per line
Date Tue, 16 Dec 2014 22:03:03 GMT
Repository: spark
Updated Branches:
  refs/heads/branch-1.2 6bd8a9666 -> 4f9916f1e


[DOCS][SQL] Add a Note on jsonFile having separate JSON objects per line

* This commit hopes to avoid the confusion I faced when trying
  to submit a regular, valid multi-line JSON file, also see

  http://apache-spark-user-list.1001560.n3.nabble.com/Loading-JSON-Dataset-fails-with-com-fasterxml-jackson-databind-JsonMappingException-td20041.html

Author: Peter Vandenabeele <peter@vandenabeele.com>

Closes #3517 from petervandenabeele/pv-docs-note-on-jsonFile-format/01 and squashes the following
commits:

1f98e52 [Peter Vandenabeele] Revert to people.json and simple Note text
6b6e062 [Peter Vandenabeele] Change the "JSON" connotation to "txt"
fca7dfb [Peter Vandenabeele] Add a Note on jsonFile having separate JSON objects per line

(cherry picked from commit 1a9e35e57ab80984b81802ffc461d19cc9239edd)
Signed-off-by: Michael Armbrust <michael@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4f9916f1
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4f9916f1
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4f9916f1

Branch: refs/heads/branch-1.2
Commit: 4f9916f1e8ffb1ffc647a036ee35702d7d7e6646
Parents: 6bd8a96
Author: Peter Vandenabeele <peter@vandenabeele.com>
Authored: Tue Dec 16 13:57:55 2014 -0800
Committer: Michael Armbrust <michael@databricks.com>
Committed: Tue Dec 16 13:58:19 2014 -0800

----------------------------------------------------------------------
 docs/sql-programming-guide.md | 12 ++++++++++++
 1 file changed, 12 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/4f9916f1/docs/sql-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index be284fb..7e3e9c0 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -625,6 +625,10 @@ This conversion can be done using one of two methods in a SQLContext:
 * `jsonFile` - loads data from a directory of JSON files where each line of the files is
a JSON object.
 * `jsonRDD` - loads data from an existing RDD where each element of the RDD is a string containing
a JSON object.
 
+Note that the file that is offered as _jsonFile_ is not a typical JSON file. Each
+line must contain a separate, self-contained valid JSON object. As a consequence,
+a regular multi-line JSON file will most often fail.
+
 {% highlight scala %}
 // sc is an existing SparkContext.
 val sqlContext = new org.apache.spark.sql.SQLContext(sc)
@@ -663,6 +667,10 @@ This conversion can be done using one of two methods in a JavaSQLContext
:
 * `jsonFile` - loads data from a directory of JSON files where each line of the files is
a JSON object.
 * `jsonRDD` - loads data from an existing RDD where each element of the RDD is a string containing
a JSON object.
 
+Note that the file that is offered as _jsonFile_ is not a typical JSON file. Each
+line must contain a separate, self-contained valid JSON object. As a consequence,
+a regular multi-line JSON file will most often fail.
+
 {% highlight java %}
 // sc is an existing JavaSparkContext.
 JavaSQLContext sqlContext = new org.apache.spark.sql.api.java.JavaSQLContext(sc);
@@ -701,6 +709,10 @@ This conversion can be done using one of two methods in a SQLContext:
 * `jsonFile` - loads data from a directory of JSON files where each line of the files is
a JSON object.
 * `jsonRDD` - loads data from an existing RDD where each element of the RDD is a string containing
a JSON object.
 
+Note that the file that is offered as _jsonFile_ is not a typical JSON file. Each
+line must contain a separate, self-contained valid JSON object. As a consequence,
+a regular multi-line JSON file will most often fail.
+
 {% highlight python %}
 # sc is an existing SparkContext.
 from pyspark.sql import SQLContext


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message