spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jijiao Zeng (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-24047) use spark package to load csv file
Date Sun, 22 Apr 2018 20:13:00 GMT
Jijiao Zeng created SPARK-24047:
-----------------------------------

             Summary: use spark package to load csv file
                 Key: SPARK-24047
                 URL: https://issues.apache.org/jira/browse/SPARK-24047
             Project: Spark
          Issue Type: IT Help
          Components: Input/Output
    Affects Versions: 2.3.0
            Reporter: Jijiao Zeng


I am new to spark. I used spark.read.csv() function read local csv.file.

But I got following error:

 
h2. File "<stdin>", line 1, in <module>
h2.   File "/Users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/sql/readwriter.py", line
439, in csv
h2.     return self._df(self._jreader.csv(self._spark._sc._jvm.PythonUtils.toSeq(path)))
h2.   File "/Users/jzeng/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py",
line 1160, in __call__
h2.   File "/Users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63,
in deco
h2.     return f(*a, **kw)
h2.   File "/Users/jzeng/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py",
line 320, in get_return_value
h2. py4j.protocol.Py4JJavaError: An error occurred while calling o58.csv.
h2. : java.lang.AssertionError: assertion failed: Conflicting directory structures detected.
Suspicious paths:
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/data/streaming
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/lib
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples/pythonconverters
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/python/ml
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/test_support/hello
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/resources
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/streaming
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/mllib/stat
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/data/mllib/als
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/python/sql
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark.egg-info
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/test_support/hello/sub_hello
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/licenses
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/test_support/sql/parquet_partitioned
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/sbin
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/test_support/sql/orc_partitioned
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/r/lib/sparkr/tests/testthat
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/sql
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/r/lib/sparkr/profile
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/java/org/apache/spark/examples/sql
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/java/org/apache/spark/examples/sql/hive
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/r/lib/sparkr
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/kubernetes/dockerfiles/spark
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/r/lib/sparkr/html
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples/sql/streaming
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/mllib/linalg
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/jars
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/test_support
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples/sql
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/r/lib/sparkr/worker
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples/graphx
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/test_support/sql/streaming
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/data/mllib/images/multi-channel
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples/streaming/clickstream
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/conf
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/r/streaming
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples/ml
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/python/sql/streaming
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/java/org/apache/spark/examples
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/docs
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/data/mllib/ridge-data
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/r/lib/sparkr/help
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/ml
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/r
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/data/mllib
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/java/org/apache/spark/examples/mllib
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/r/ml
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/r/lib
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/r/lib/sparkr/meta
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/data/mllib/images
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/r/lib/sparkr/r
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples/mllib
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/python/streaming
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/test_support/sql
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/bin
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/mllib
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/python/pyspark
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/python/mllib
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/yarn
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/java/org/apache/spark/examples/sql/streaming
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/python
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/ml/param
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/java/org/apache/spark/examples/ml
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/java/org/apache/spark/examples/streaming
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples/streaming
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples/sql/hive
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/jars
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/data/mllib/images/kittens
h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/data/graphx
h2.  
h2. If provided paths are partition directories, please set "basePath" in the options of the
data source to specify the root directory of the table. If there are multiple root directories,
please load them separately and then union them.
h2.  at scala.Predef$.assert(Predef.scala:170)
h2.  at org.apache.spark.sql.execution.datasources.PartitioningUtils$.parsePartitions(PartitioningUtils.scala:133)
h2.  at org.apache.spark.sql.execution.datasources.PartitioningUtils$.parsePartitions(PartitioningUtils.scala:98)
h2.  at org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.inferPartitioning(PartitioningAwareFileIndex.scala:153)
h2.  at org.apache.spark.sql.execution.datasources.InMemoryFileIndex.partitionSpec(InMemoryFileIndex.scala:71)
h2.  at org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.partitionSchema(PartitioningAwareFileIndex.scala:50)
h2.  at org.apache.spark.sql.execution.datasources.DataSource.combineInferredAndUserSpecifiedPartitionSchema(DataSource.scala:115)
h2.  at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:166)
h2.  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:392)
h2.  at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
h2.  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
h2.  at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:594)
h2.  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
h2.  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
h2.  at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
h2.  at java.base/java.lang.reflect.Method.invoke(Method.java:564)
h2.  at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
h2.  at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
h2.  at py4j.Gateway.invoke(Gateway.java:282)
h2.  at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
h2.  at py4j.commands.CallCommand.execute(CallCommand.java:79)
h2.  at py4j.GatewayConnection.run(GatewayConnection.java:214)
h2.  at java.base/java.lang.Thread.run(Thread.java:844)

 

Any suggestion will be appreciated. Thanks in advance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message