From reviews-return-1025053-archive-asf-public=cust-asf.ponee.io@spark.apache.org Tue Jan 28 16:28:39 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 843DC180658 for ; Tue, 28 Jan 2020 17:28:39 +0100 (CET) Received: (qmail 47078 invoked by uid 500); 28 Jan 2020 16:28:39 -0000 Mailing-List: contact reviews-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@spark.apache.org Received: (qmail 47066 invoked by uid 99); 28 Jan 2020 16:28:38 -0000 Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Jan 2020 16:28:38 +0000 From: GitBox To: reviews@spark.apache.org Subject: [GitHub] [spark] heuermh commented on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0 Message-ID: <158022891876.5845.5423483738908185653.gitbox@gitbox.apache.org> References: In-Reply-To: Date: Tue, 28 Jan 2020 16:28:38 -0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit heuermh commented on issue #26804: [WIP][SPARK-26346][BUILD][SQL] Upgrade parquet to 1.11.0 URL: https://github.com/apache/spark/pull/26804#issuecomment-579335798 > Also how well does this work with the Avro that we use? I know that's always a problem area, but maybe it's behind us now. This pull request upgrades the Avro transitive dependency version to 1.9.1 without upgrading the Spark Avro dependency version, which is 1.8.2. This will cause runtime exceptions such as ``` Caused by: java.lang.NoSuchMethodError: org.apache.parquet.schema.Types$PrimitiveBuilder.as(Lorg/apache/parquet/schema/LogicalTypeAnnotation;)Lorg/apache/parquet/schema/Types$Builder; at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:161) at org.apache.parquet.avro.AvroSchemaConverter.convertUnion(AvroSchemaConverter.java:226) at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:182) at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:141) at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:244) at org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:135) at org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:126) at org.apache.parquet.avro.AvroWriteSupport.init(AvroWriteSupport.java:121) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:388) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349) at org.apache.spark.rdd.InstrumentedOutputFormat.getRecordWriter(InstrumentedOutputFormat.scala:35) at org.apache.spark.internal.io.HadoopMapReduceWriteConfigUtil.initWriter(SparkHadoopWriter.scala:350) at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:120) at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:83) at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ``` Perhaps if the `parquet-avro` test scope dependency did not exclude the Avro 1.9.1 transitive dependencies these runtime issues would show up in Spark unit tests rather than in downstream projects. I am testing this hypothesis today. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org For additional commands, e-mail: reviews-help@spark.apache.org