spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anabranch <...@git.apache.org>
Subject [GitHub] spark issue #16138: [WIP][SPARK-16609] Add to_date/to_timestamp with format ...
Date Sun, 08 Jan 2017 04:56:35 GMT
Github user anabranch commented on the issue:

    https://github.com/apache/spark/pull/16138
  
    I believe now why my previous implementation did not work.
    
    My implementation originally looked like this:
    
    ```scala
    case class ParseToTimestamp(left: Expression, format: Expression, child: Expression)
      extends RuntimeReplaceable {
    
      def this(left: Expression, format: Expression) = {
      this(left, format, Cast(UnixTimestamp(left, format), TimestampType))
    }
    
      override def checkInputDataTypes(): TypeCheckResult = {
        if (left.dataType != StringType) {
          TypeCheckResult.TypeCheckFailure(s"TO_TIMESTAMP requires both inputs to be strings")
        }
        TypeCheckResult.TypeCheckSuccess
      }
    
      override def flatArguments: Iterator[Any] = Iterator(left, format)
      override def sql: String = s"$prettyName(${left.sql}, ${format.sql})"
    
      override def prettyName: String = "to_timestamp"
      override def dataType: DataType = TimestampType
    }
    ```
    
    This implementation with a simple example would fail.
    
    ```scala
    import org.apache.spark.sql.functions._
    
    val ss1 = "2015-07-24 10:00:00"
    val ss2 = "2015-07-25 02:02:02"
    val df2 = Seq((ss1), (ss2)).toDF("ss")
    
    df2.select(to_timestamp(col("ss"))).show
    ```
    This throws a 
    
    ```
    org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on
unresolved object, tree: 'ss
    ```
    
    A `Trace` log level shows that the columns are resolved however the error originates after
analysis and during `CheckInputDataTypes`. This function seeks to analyze the `left` input,
however because this column is input into a `RuntimeReplaceable` function the relevant, and
resolved, tree is actually the `child` argument - `left` remains unresolved (and therefore
throws the above error).
    
    I believe this to be the root cause and that has in turn showed me that I do not need
to perform input validation for this function in the first place. Since I only wrap functions,
they are performing the exact same input validation that I would be. Since no new logic is
implemented, there's no point in redundantly validating something that will be validated again
anyways, especially when the system won't let me.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message