spark-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From wenc...@apache.org
Subject spark git commit: [SPARK-23434][SQL][BRANCH-2.2] Spark should not warn `metadata directory` for a HDFS file path
Date Mon, 05 Mar 2018 22:29:07 GMT
Repository: spark
Updated Branches:
  refs/heads/branch-2.2 9bd25c9bf -> 4864d2104


[SPARK-23434][SQL][BRANCH-2.2] Spark should not warn `metadata directory` for a HDFS file
path

## What changes were proposed in this pull request?

In a kerberized cluster, when Spark reads a file path (e.g. `people.json`), it warns with
a wrong warning message during looking up `people.json/_spark_metadata`. The root cause of
this situation is the difference between `LocalFileSystem` and `DistributedFileSystem`. `LocalFileSystem.exists()`
returns `false`, but `DistributedFileSystem.exists` raises `org.apache.hadoop.security.AccessControlException`.

```scala
scala> spark.read.json("file:///usr/hdp/current/spark-client/examples/src/main/resources/people.json").show
+----+-------+
| age|   name|
+----+-------+
|null|Michael|
|  30|   Andy|
|  19| Justin|
+----+-------+

scala> spark.read.json("hdfs:///tmp/people.json")
18/02/15 05:00:48 WARN streaming.FileStreamSink: Error while looking for metadata directory.
18/02/15 05:00:48 WARN streaming.FileStreamSink: Error while looking for metadata directory.
```

After this PR,
```scala
scala> spark.read.json("hdfs:///tmp/people.json").show
+----+-------+
| age|   name|
+----+-------+
|null|Michael|
|  30|   Andy|
|  19| Justin|
+----+-------+
```

## How was this patch tested?

Manual.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #20715 from dongjoon-hyun/SPARK-23434-2.2.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4864d210
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4864d210
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4864d210

Branch: refs/heads/branch-2.2
Commit: 4864d21041374f2aca0846c7f22e8d0c93024b7a
Parents: 9bd25c9
Author: Dongjoon Hyun <dongjoon@apache.org>
Authored: Mon Mar 5 14:29:04 2018 -0800
Committer: Wenchen Fan <wenchen@databricks.com>
Committed: Mon Mar 5 14:29:04 2018 -0800

----------------------------------------------------------------------
 .../spark/sql/execution/streaming/FileStreamSink.scala       | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/4864d210/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala
----------------------------------------------------------------------
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala
index 6885d0b..397be27 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala
@@ -42,9 +42,11 @@ object FileStreamSink extends Logging {
         try {
           val hdfsPath = new Path(singlePath)
           val fs = hdfsPath.getFileSystem(hadoopConf)
-          val metadataPath = new Path(hdfsPath, metadataDir)
-          val res = fs.exists(metadataPath)
-          res
+          if (fs.isDirectory(hdfsPath)) {
+            fs.exists(new Path(hdfsPath, metadataDir))
+          } else {
+            false
+          }
         } catch {
           case NonFatal(e) =>
             logWarning(s"Error while looking for metadata directory.")


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


Mime
View raw message