spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Weiqing Yang (JIRA)" <>
Subject [jira] [Commented] (SPARK-6628) ClassCastException occurs when executing sql statement "insert into" on hbase table
Date Mon, 15 May 2017 23:19:04 GMT


Weiqing Yang commented on SPARK-6628:

We met with this issue too.

The major issue is:
cannot be cast to
The reason is:
public interface HiveOutputFormat<K, V> extends OutputFormat<K, V> {…}

public class HiveHBaseTableOutputFormat extends
    TableOutputFormat<ImmutableBytesWritable> implements
    OutputFormat<ImmutableBytesWritable, Object> {...}

>From the two snippets above, we can see both HiveHBaseTableOutputFormat and HiveOutputFormat
'extends' /'implements' OutputFormat, and can not cast to each other. 

Spark initials the outputformat in SparkHiveWriterContainer of Spark 1.6, 2.0, 2.1 (or: in
HiveFileFormat of Spark 2.2 /Master)
@transient private lazy val outputFormat =
        jobConf.value.getOutputFormat.asInstanceOf[HiveOutputFormat[AnyRef, Writable]]
Notice: this file output format is {color:red}HiveOutputFormat{color}
However, when users write the data into the hbase, the outputFormat is HiveHBaseTableOutputFormat,
it isn't instance of HiveOutputFormat.

I am going to submit a PR for this.

> ClassCastException occurs when executing sql statement "insert into" on hbase table
> -----------------------------------------------------------------------------------
>                 Key: SPARK-6628
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: meiyoula
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage
3.0 failed 4 times, most recent failure: Lost task 1.3 in stage 3.0 (TID 12, vm-17): java.lang.ClassCastException:
org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat cannot be cast to
>         at org.apache.spark.sql.hive.SparkHiveWriterContainer.outputFormat$lzycompute(hiveWriterContainers.scala:72)
>         at org.apache.spark.sql.hive.SparkHiveWriterContainer.outputFormat(hiveWriterContainers.scala:71)
>         at org.apache.spark.sql.hive.SparkHiveWriterContainer.getOutputName(hiveWriterContainers.scala:91)
>         at org.apache.spark.sql.hive.SparkHiveWriterContainer.initWriters(hiveWriterContainers.scala:115)
>         at org.apache.spark.sql.hive.SparkHiveWriterContainer.executorSideSetup(hiveWriterContainers.scala:84)
>         at$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:112)
>         at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:93)
>         at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:93)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>         at
>         at org.apache.spark.executor.Executor$
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
>         at java.util.concurrent.ThreadPoolExecutor$

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message