spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From liancheng <...@git.apache.org>
Subject [GitHub] spark pull request: [SPARK-4785] [SQL] Support udf instance ser/de...
Date Tue, 09 Dec 2014 08:53:01 GMT
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/3640#issuecomment-66251380
  
    Appreciate a lot for fixing this case! The serialization wrapper class makes sense. However,
would like to make some refactoring. A summary of my comments above:
    
    1. Renaming `HiveFunctionCache` to `HiveFunctionWrapper`
    2. Moving `HiveFunctionWrapper` to the shim layer, and keep Hive 0.12.0 code path exactly
the same as before.
    
       Considering Hive 0.12.0 is not affected by this issue, and 1.2.0 release is really
close, I'd like to lower the possibility of breaking 0.12.0 code paths as much as possible.

    
    3. Add a `HiveSimpleUdfWrapper`, which inherits from `HiveFunctionWrapper`.
    
       As you've mentioned in the code, `HiveSimpleUdf` is a special case, it shouldn't be
serialized, and should always create a fresh object on executor side. Currently this special
case complicates `HiveFunctionWrapper` implementation and makes it somewhat confusing. Defining
a special subclass for `HvieSimpleUdf` helps making `HiveFunctionWrapper` simpler (e.g. no
need to serialize the boolean flag any more).
    
    4. Avoid using `Utilities.de/serializePlan` by mimicking `Utilities.de/serializeObjectByKryo`
in the `read/writeExternal` methods of the wrapper class, so that we can remove the expensive
`HiveConf` instantiation.
    
    In a word, we can add two classes, `HiveFunctionWrapper` and `HiveSImpleUdfWrapper` into
the shim layer, and make sure that the 0.12.0 version behaves exactly the same as before.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message