spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rednaxelafx <...@git.apache.org>
Subject [GitHub] spark pull request #20757: [SPARK-23595][SQL] ValidateExternalType should su...
Date Wed, 07 Mar 2018 22:28:39 GMT
Github user rednaxelafx commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20757#discussion_r173006281
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
---
    @@ -1408,11 +1409,37 @@ case class ValidateExternalType(child: Expression, expected: DataType)
     
       override def dataType: DataType = RowEncoder.externalDataTypeForInput(expected)
     
    -  override def eval(input: InternalRow): Any =
    -    throw new UnsupportedOperationException("Only code-generated evaluation is supported")
    -
       private val errMsg = s" is not a valid external type for schema of ${expected.simpleString}"
     
    +  private lazy val checkType = expected match {
    +    case _: DecimalType =>
    +      (value: Any) => {
    +        Seq(classOf[java.math.BigDecimal], classOf[scala.math.BigDecimal], classOf[Decimal])
    +          .exists { x => value.getClass.isAssignableFrom(x) }
    +      }
    +    case _: ArrayType =>
    +      (value: Any) => {
    +        value.getClass.isAssignableFrom(classOf[Seq[_]]) || value.getClass.isArray
    --- End diff --
    
    Hi guys, sorry I'm late.
    
    In your new code you're doing:
    ```diff
    +    case _: ArrayType =>
    +      (value: Any) => {
    +        value.getClass.isArray || value.isInstanceOf[Seq[_]]
    +      }
    ```
    which is good. `xxx.getClass().isAssignableFrom(some_class_literal)` in the old version
of this PR is actually backwards, it should have been `some_class_literal.isAssignableFrom(xxx.getClass())`,
e.g.
    ```
    scala> classOf[String].isAssignableFrom(classOf[Object])
    res0: Boolean = false
    
    scala> classOf[Object].isAssignableFrom(classOf[String])
    res1: Boolean = true
    ```
    and the latter is semantically the same as `xxx.isInstanceOf[some_class]`. `isInstanceOf[]`
is guaranteed to be at least as fast as `some_class_literal.isAssignableFrom(xxx.getClass())`,
and in general `isInstanceOf[]` is faster.
    
    `xxx.getClass().isArray()` has a fixed overhead, whereas `isInstanceOf[]` can have a fast
path slightly faster than the `isArray` and a slow path that can be much slower than `isArray`.
So putting the `isArray` check first in your new code makes more sense to me.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message