hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-8722) Enhance InputSplitShims to extend InputSplitWithLocationInfo [Spark Branch]
Date Thu, 18 Dec 2014 14:51:13 GMT

     [ https://issues.apache.org/jira/browse/HIVE-8722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xuefu Zhang updated HIVE-8722:
------------------------------
    Issue Type: Sub-task  (was: Improvement)
        Parent: HIVE-9134

> Enhance InputSplitShims to extend InputSplitWithLocationInfo [Spark Branch]
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-8722
>                 URL: https://issues.apache.org/jira/browse/HIVE-8722
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Jimmy Xiang
>
> We got thie following exception in hive.log:
> {noformat}
> 2014-11-03 11:45:49,865 DEBUG rdd.HadoopRDD
> (Logging.scala:logDebug(84)) - Failed to use InputSplitWithLocations.
> java.lang.ClassCastException: Cannot cast
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit
> to org.apache.hadoop.mapred.InputSplitWithLocationInfo
>         at java.lang.Class.cast(Class.java:3094)
>         at org.apache.spark.rdd.HadoopRDD.getPreferredLocations(HadoopRDD.scala:278)
>         at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:216)
>         at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:216)
>         at scala.Option.getOrElse(Option.scala:120)
>         at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:215)
>         at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1303)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply$mcVI$sp(DAGScheduler.scala:1313)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1312)
> {noformat}
> My understanding is that the split location info helps Spark to execute tasks more efficiently.
This could help other execution engine too. So we should consider to enhance InputSplitShim
to implement InputSplitWithLocationInfo if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message