spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "KaiXu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-19725) different parquet dependency in spark2.0.x and Hive2.x cause failure of HoS when using parquet file format
Date Sat, 25 Feb 2017 11:53:44 GMT

     [ https://issues.apache.org/jira/browse/SPARK-19725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

KaiXu updated SPARK-19725:
--------------------------
    Description: 
the parquet version in hive2.x is 1.8.1 while in spark2.0.x is 1.7.0, so when run HoS queries
using parquet file format would encounter some jars conflict problems:

Starting Spark Job = d1f6825c-48ea-45b8-9614-4266f2d1f0bd
Job failed with java.lang.NoSuchMethodError: org.apache.parquet.schema.Types$PrimitiveBuilder.length(I)Lorg/apache/parquet/schema/Types$BasePrimitiveBuilder;
FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask.
java.util.concurrent.ExecutionException: Exception thrown by job
        at org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:272)
        at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:277)
        at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:362)
        at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage
1.0 failed 4 times, most recent failure: Lost task 2.3 in stage 1.0 (TID 9, hsx-node7): java.lang.RuntimeException:
Error processing row: java.lang.NoSuchMethodError: org.apache.parquet.schema.Types$PrimitiveBuilder.length(I)Lorg/apache/parquet/schema/Types$BasePrimitiveBuilder;
        at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:149)
        at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:48)
        at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
        at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
        at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
        at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
        at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1976)
        at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1976)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
        at org.apache.spark.scheduler.Task.run(Task.scala:86)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoSuchMethodError: org.apache.parquet.schema.Types$PrimitiveBuilder.length(I)Lorg/apache/parquet/schema/Types$BasePrimitiveBuilder;
        at org.apache.hadoop.hive.ql.io.parquet.convert.HiveSchemaConverter.convertType(HiveSchemaConverter.java:100)
        at org.apache.hadoop.hive.ql.io.parquet.convert.HiveSchemaConverter.convertType(HiveSchemaConverter.java:56)
        at org.apache.hadoop.hive.ql.io.parquet.convert.HiveSchemaConverter.convertTypes(HiveSchemaConverter.java:50)
        at org.apache.hadoop.hive.ql.io.parquet.convert.HiveSchemaConverter.convert(HiveSchemaConverter.java:39)
        at org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getHiveRecordWriter(MapredParquetOutputFormat.java:115)
        at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:286)
        at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:271)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:609)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:553)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:664)
        at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:101)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
        at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:137)

  was:
the parquet version in hive2.x is 1.8.1 while in spark2.x is 1.7.0, so when run HoS queries
using parquet file format would encounter some jars conflict problems:

Starting Spark Job = d1f6825c-48ea-45b8-9614-4266f2d1f0bd
Job failed with java.lang.NoSuchMethodError: org.apache.parquet.schema.Types$PrimitiveBuilder.length(I)Lorg/apache/parquet/schema/Types$BasePrimitiveBuilder;
FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask.
java.util.concurrent.ExecutionException: Exception thrown by job
        at org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:272)
        at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:277)
        at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:362)
        at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage
1.0 failed 4 times, most recent failure: Lost task 2.3 in stage 1.0 (TID 9, hsx-node7): java.lang.RuntimeException:
Error processing row: java.lang.NoSuchMethodError: org.apache.parquet.schema.Types$PrimitiveBuilder.length(I)Lorg/apache/parquet/schema/Types$BasePrimitiveBuilder;
        at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:149)
        at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:48)
        at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
        at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
        at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
        at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
        at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1976)
        at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1976)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
        at org.apache.spark.scheduler.Task.run(Task.scala:86)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoSuchMethodError: org.apache.parquet.schema.Types$PrimitiveBuilder.length(I)Lorg/apache/parquet/schema/Types$BasePrimitiveBuilder;
        at org.apache.hadoop.hive.ql.io.parquet.convert.HiveSchemaConverter.convertType(HiveSchemaConverter.java:100)
        at org.apache.hadoop.hive.ql.io.parquet.convert.HiveSchemaConverter.convertType(HiveSchemaConverter.java:56)
        at org.apache.hadoop.hive.ql.io.parquet.convert.HiveSchemaConverter.convertTypes(HiveSchemaConverter.java:50)
        at org.apache.hadoop.hive.ql.io.parquet.convert.HiveSchemaConverter.convert(HiveSchemaConverter.java:39)
        at org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getHiveRecordWriter(MapredParquetOutputFormat.java:115)
        at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:286)
        at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:271)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:609)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:553)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:664)
        at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:101)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
        at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:137)

        Summary: different parquet dependency in spark2.0.x and Hive2.x cause failure of HoS
when using parquet file format  (was: different parquet dependency in spark2.x and Hive2.x
cause failure of HoS when using parquet file format)

> different parquet dependency in spark2.0.x and Hive2.x cause failure of HoS when using
parquet file format
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-19725
>                 URL: https://issues.apache.org/jira/browse/SPARK-19725
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.0.2
>         Environment: spark2.0.2
> hive2.2
> hadoop2.7.1
>            Reporter: KaiXu
>
> the parquet version in hive2.x is 1.8.1 while in spark2.0.x is 1.7.0, so when run HoS
queries using parquet file format would encounter some jars conflict problems:
> Starting Spark Job = d1f6825c-48ea-45b8-9614-4266f2d1f0bd
> Job failed with java.lang.NoSuchMethodError: org.apache.parquet.schema.Types$PrimitiveBuilder.length(I)Lorg/apache/parquet/schema/Types$BasePrimitiveBuilder;
> FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask.
java.util.concurrent.ExecutionException: Exception thrown by job
>         at org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:272)
>         at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:277)
>         at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:362)
>         at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2
in stage 1.0 failed 4 times, most recent failure: Lost task 2.3 in stage 1.0 (TID 9, hsx-node7):
java.lang.RuntimeException: Error processing row: java.lang.NoSuchMethodError: org.apache.parquet.schema.Types$PrimitiveBuilder.length(I)Lorg/apache/parquet/schema/Types$BasePrimitiveBuilder;
>         at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:149)
>         at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:48)
>         at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
>         at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
>         at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>         at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>         at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>         at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1976)
>         at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1976)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>         at org.apache.spark.scheduler.Task.run(Task.scala:86)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NoSuchMethodError: org.apache.parquet.schema.Types$PrimitiveBuilder.length(I)Lorg/apache/parquet/schema/Types$BasePrimitiveBuilder;
>         at org.apache.hadoop.hive.ql.io.parquet.convert.HiveSchemaConverter.convertType(HiveSchemaConverter.java:100)
>         at org.apache.hadoop.hive.ql.io.parquet.convert.HiveSchemaConverter.convertType(HiveSchemaConverter.java:56)
>         at org.apache.hadoop.hive.ql.io.parquet.convert.HiveSchemaConverter.convertTypes(HiveSchemaConverter.java:50)
>         at org.apache.hadoop.hive.ql.io.parquet.convert.HiveSchemaConverter.convert(HiveSchemaConverter.java:39)
>         at org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getHiveRecordWriter(MapredParquetOutputFormat.java:115)
>         at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:286)
>         at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:271)
>         at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:609)
>         at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:553)
>         at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:664)
>         at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:101)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>         at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:137)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message