spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jaceklaskowski <...@git.apache.org>
Subject [GitHub] spark pull request #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Str...
Date Thu, 07 Jul 2016 20:55:15 GMT
Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14087#discussion_r69985100
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala
---
    @@ -281,6 +281,31 @@ final class DataStreamReader private[sql](sparkSession: SparkSession)
extends Lo
       @Experimental
       def text(path: String): DataFrame = format("text").load(path)
     
    +  /**
    +   * Loads text files and returns a [[Dataset]] of String. The underlying schema of the
Dataset
    +   * contains a single string column named "value".
    +   *
    +   * If the directory structure of the text files contains partitioning information,
those are
    +   * ignored in the resulting Dataset. To include partitioning information as columns,
use `text`.
    +   *
    +   * Each line in the text files is a new element in the resulting Dataset. For example:
    +   * {{{
    +   *   // Scala:
    +   *   spark.read.textFile("/path/to/spark/README.md")
    +   *
    +   *   // Java:
    +   *   spark.read().textFile("/path/to/spark/README.md")
    +   * }}}
    +   *
    +   * @param path input path
    +   * @since 2.0.0
    +   */
    +  def textFile(path: String): Dataset[String] = {
    +    if (userSpecifiedSchema.nonEmpty) {
    +      throw new AnalysisException("User specified schema not supported with `textFile`")
    +    }
    +    text(path).select("value").as[String](sparkSession.implicits.newStringEncoder)
    --- End diff --
    
    I'm surprised that `sparkSession.implicits.newStringEncoder` is required here? Why is
`sparkSession.implicits._` not imported here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message