spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zjffdu <...@git.apache.org>
Subject [GitHub] spark pull request: [SPARK-11102] [SQL] Uninformative exception wh...
Date Wed, 11 Nov 2015 14:04:49 GMT
Github user zjffdu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9490#discussion_r44534648
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
    @@ -604,10 +609,33 @@ abstract class HadoopFsRelation private[sql](maybePartitionSpec:
Option[Partitio
           }
         }
     
    -    buildInternalScan(requiredColumns, filters, inputStatuses, broadcastedConf)
    +    if (!inputExists) {
    +      throw new IOException("Input paths do not exist, input paths="
    +        + inputPaths.mkString("[", ",", "]"))
    +    } else {
    +      if (inputStatuses.isEmpty && readFromHDFS) {
    +        logWarning("Input paths are empty, input paths=" + inputPaths.mkString("[", ",",
"]"))
    +        sqlContext.sparkContext.emptyRDD[InternalRow]
    +      } else {
    +        buildInternalScan(requiredColumns, filters, inputStatuses, broadcastedConf)
    +      }
    +    }
       }
     
       /**
    +   * Most of time, HadoopFsRelation should check the inputPaths, but for some cases it
is not,
    +   * e.g. JsonRelation may read from RDD[String]
    +   */
    +  def inputExists: Boolean = fileStatusCache.inputExists
    +
    +  /**
    +   * Most of time, HadoopFsRelation should read from hdfs, but some cases it is not,
    +   * e.g. JsonRelation may read from RDD[String]
    +   * @return
    +   */
    +  def readFromHDFS: Boolean = true
    --- End diff --
    
    Agree it's weird for JsonRelation don't read read from hdfs. I have one suggestion that
we can separate the JsonRelation based on RDD input to a new JsonRDDRelation which don't extend
HadoopFsRelation. since JsonRelation is private, this would not bring incompatibility issue


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message