spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Takeshi Yamamuro (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-6747) Support List<> as a return type in Hive UDF
Date Tue, 07 Apr 2015 17:12:12 GMT

     [ https://issues.apache.org/jira/browse/SPARK-6747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Takeshi Yamamuro updated SPARK-6747:
------------------------------------
    Description: 
The current implementation can't handle List<> as a return type in Hive UDF.
We assume an UDF below;

public class UDFToListString extends UDF {
    public List<String> evaluate(Object o) {
        return Arrays.asList("xxx", "yyy", "zzz");
    }
}

An exception of scala.MatchError is thrown as follows when the UDF used;

scala.MatchError: interface java.util.List (of class java.lang.Class)
	at org.apache.spark.sql.hive.HiveInspectors$class.javaClassToDataType(HiveInspectors.scala:174)
	at org.apache.spark.sql.hive.HiveSimpleUdf.javaClassToDataType(hiveUdfs.scala:76)
	at org.apache.spark.sql.hive.HiveSimpleUdf.dataType$lzycompute(hiveUdfs.scala:106)
	at org.apache.spark.sql.hive.HiveSimpleUdf.dataType(hiveUdfs.scala:106)
	at org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:131)
	at org.apache.spark.sql.catalyst.planning.PhysicalOperation$$anonfun$collectAliases$1.applyOrElse(patterns.scala:95)
	at org.apache.spark.sql.catalyst.planning.PhysicalOperation$$anonfun$collectAliases$1.applyOrElse(patterns.scala:94)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
	at scala.collection.TraversableLike$$anonfun$collect$1.apply(TraversableLike.scala:278)
...

To fix this problem, we need to add an entry for List<> in HiveInspectors#javaClassToDataType.
However, it has one difficulty because of type erasure in JVM.
We assume that lines below are appended in HiveInspectors#javaClassToDataType;

    // list type
    case c: Class[_] if c == classOf[java.util.List[java.lang.Object]] =>
    val tpe = c.getGenericInterfaces()(0).asInstanceOf[ParameterizedType]
    println(tpe.getActualTypeArguments()(0).toString()) => 'E'

This logic fails to catch a component type in List<>.

  was:
The current implementation can't handle List<> as a return type in Hive UDF.
We assume an UDF below;

public class UDFToListString extends UDF {
    public List<String> evaluate(Object o) {
        return Arrays.asList("xxx", "yyy", "zzz");
    }
}

An exception of scala.MatchError is thrown as follows when the UDF used;

scala.MatchError: interface java.util.List (of class java.lang.Class)
	at org.apache.spark.sql.hive.HiveInspectors$class.javaClassToDataType(HiveInspectors.scala:174)
	at org.apache.spark.sql.hive.HiveSimpleUdf.javaClassToDataType(hiveUdfs.scala:76)
	at org.apache.spark.sql.hive.HiveSimpleUdf.dataType$lzycompute(hiveUdfs.scala:106)
	at org.apache.spark.sql.hive.HiveSimpleUdf.dataType(hiveUdfs.scala:106)
	at org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:131)
	at org.apache.spark.sql.catalyst.planning.PhysicalOperation$$anonfun$collectAliases$1.applyOrElse(patterns.scala:95)
	at org.apache.spark.sql.catalyst.planning.PhysicalOperation$$anonfun$collectAliases$1.applyOrElse(patterns.scala:94)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
	at scala.collection.TraversableLike$$anonfun$collect$1.apply(TraversableLike.scala:278)
...

To fix this problem, we need to add an entry for List<> in HiveInspectors#javaClassToDataType.



> Support List<> as a return type in Hive UDF
> -------------------------------------------
>
>                 Key: SPARK-6747
>                 URL: https://issues.apache.org/jira/browse/SPARK-6747
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Takeshi Yamamuro
>
> The current implementation can't handle List<> as a return type in Hive UDF.
> We assume an UDF below;
> public class UDFToListString extends UDF {
>     public List<String> evaluate(Object o) {
>         return Arrays.asList("xxx", "yyy", "zzz");
>     }
> }
> An exception of scala.MatchError is thrown as follows when the UDF used;
> scala.MatchError: interface java.util.List (of class java.lang.Class)
> 	at org.apache.spark.sql.hive.HiveInspectors$class.javaClassToDataType(HiveInspectors.scala:174)
> 	at org.apache.spark.sql.hive.HiveSimpleUdf.javaClassToDataType(hiveUdfs.scala:76)
> 	at org.apache.spark.sql.hive.HiveSimpleUdf.dataType$lzycompute(hiveUdfs.scala:106)
> 	at org.apache.spark.sql.hive.HiveSimpleUdf.dataType(hiveUdfs.scala:106)
> 	at org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:131)
> 	at org.apache.spark.sql.catalyst.planning.PhysicalOperation$$anonfun$collectAliases$1.applyOrElse(patterns.scala:95)
> 	at org.apache.spark.sql.catalyst.planning.PhysicalOperation$$anonfun$collectAliases$1.applyOrElse(patterns.scala:94)
> 	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
> 	at scala.collection.TraversableLike$$anonfun$collect$1.apply(TraversableLike.scala:278)
> ...
> To fix this problem, we need to add an entry for List<> in HiveInspectors#javaClassToDataType.
> However, it has one difficulty because of type erasure in JVM.
> We assume that lines below are appended in HiveInspectors#javaClassToDataType;
>     // list type
>     case c: Class[_] if c == classOf[java.util.List[java.lang.Object]] =>
>     val tpe = c.getGenericInterfaces()(0).asInstanceOf[ParameterizedType]
>     println(tpe.getActualTypeArguments()(0).toString()) => 'E'
> This logic fails to catch a component type in List<>.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message