From commits-return-13826-archive-asf-public=cust-asf.ponee.io@hudi.apache.org Fri Mar 20 15:55:03 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 1A384180181 for ; Fri, 20 Mar 2020 16:55:03 +0100 (CET) Received: (qmail 73564 invoked by uid 500); 20 Mar 2020 15:55:02 -0000 Mailing-List: contact commits-help@hudi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hudi.apache.org Delivered-To: mailing list commits@hudi.apache.org Received: (qmail 73553 invoked by uid 99); 20 Mar 2020 15:55:02 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Mar 2020 15:55:02 +0000 Received: from jira-he-de.apache.org (static.172.67.40.188.clients.your-server.de [188.40.67.172]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 79B2AE044E for ; Fri, 20 Mar 2020 15:55:01 +0000 (UTC) Received: from jira-he-de.apache.org (localhost.localdomain [127.0.0.1]) by jira-he-de.apache.org (ASF Mail Server at jira-he-de.apache.org) with ESMTP id 9CF277801AE for ; Fri, 20 Mar 2020 15:55:00 +0000 (UTC) Date: Fri, 20 Mar 2020 15:55:00 +0000 (UTC) From: "Alexander Filipchik (Jira)" To: commits@hudi.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HUDI-721) AvroConversionUtils is broken for complex types in 0.6 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HUDI-721?page=3Dcom.atlassian.j= ira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D170634= 71#comment-17063471 ]=20 Alexander Filipchik commented on HUDI-721: ------------------------------------------ Serializations works on staging with the fixed. But job can't complete due = to: HUDI-722 > AvroConversionUtils is broken for complex types in 0.6 > ------------------------------------------------------ > > Key: HUDI-721 > URL: https://issues.apache.org/jira/browse/HUDI-721 > Project: Apache Hudi (incubating) > Issue Type: Bug > Components: Common Core > Reporter: Alexander Filipchik > Priority: Major > Fix For: 0.6.0 > > > hi, > was working on the upgrade from 0.5 to 0.6 and hit a bug in AvroConversio= nUtils. I originally blames it on Spark parquet to avro schema generator (c= onvertStructTypeToAvroSchema method), but after some debugging I'm pretty s= ure the issue is somewhere in the: AvroConversionHelper. > What happens: when complexes type is extracted using SqlTransformer (usin= g select bla fro ) where bla is complex type with arrays of struct, Kr= yo serialization breaks with : > =C2=A0 > {code:java} > 28701 [dag-scheduler-event-loop] INFO org.apache.spark.scheduler.DAGSche= duler - ResultStage 1 (isEmpty at DeltaSync.java:337) failed in 12.146 s d= ue to Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times,= most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executo= r driver): org.apache.avro.UnresolvedUnionException: Not in union=20 > =09at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:7= 40) > =09at org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatu= mWriter.java:205) > =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(G= enericDatumWriter.java:123) > =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter= .java:75) > =09at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumW= riter.java:166) > =09at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatum= Writer.java:156) > =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(G= enericDatumWriter.java:118) > =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter= .java:75) > =09at org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumW= riter.java:192) > =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(G= enericDatumWriter.java:120) > =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter= .java:75) > =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(G= enericDatumWriter.java:125) > =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter= .java:75) > =09at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumW= riter.java:166) > =09at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatum= Writer.java:156) > =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(G= enericDatumWriter.java:118) > =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter= .java:75) > =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(G= enericDatumWriter.java:125) > =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter= .java:75) > =09at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumW= riter.java:166) > =09at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatum= Writer.java:156) > =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(G= enericDatumWriter.java:118) > =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter= .java:75) > =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(G= enericDatumWriter.java:125) > =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter= .java:75) > =09at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumW= riter.java:166) > =09at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatum= Writer.java:156) > =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(G= enericDatumWriter.java:118) > =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter= .java:75) > =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter= .java:62) > =09at org.apache.spark.serializer.GenericAvroSerializer.serializeDatum(Ge= nericAvroSerializer.scala:125) > =09at org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvro= Serializer.scala:159) > =09at org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvro= Serializer.scala:47) > =09at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651) > =09at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$Objec= tArraySerializer.write(DefaultArraySerializers.java:361) > =09at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$Objec= tArraySerializer.write(DefaultArraySerializers.java:302) > =09at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651) > =09at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSe= rializer.scala:351) > =09at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:45= 6) > =09at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto= r.java:1149) > =09at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecut= or.java:624) > =09at java.lang.Thread.run(Thread.java:748)Driver stacktrace: > 28702 [main] INFO org.apache.spark.scheduler.DAGScheduler - Job 1 faile= d: isEmpty at DeltaSync.java:337, took 12.149897 s > 28702 [main] ERROR org.apache.hudi.utilities.deltastreamer.HoodieDeltaStr= eamer - Got error running delta sync once. Shutting down > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0= in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1= .0 (TID 1, localhost, executor driver): org.apache.avro.UnresolvedUnionExce= ption: Not in union [{"type":"record","name":"order_item_detail","namespace= ":"hoodie.source.hoodie_source.order.customer_order.customer_items","fields= ":[{"name":"external_id","type":[{"type":"record","name":"external_id","nam= espace":"hoodie.source.hoodie_source.order.customer_order.customer_items.or= der_item_detail","fields":[{"name":"id","type":["string","null"]},{"name":"= display_id","type":["string","null"]},{"name":"exist","type":["boolean","nu= ll"]}]},"null"]},{"name":"name","type":["string","null"]},{"name":"sale_pri= ce","type":[{"type":"record","name":"sale_price","namespace":"hoodie.source= .hoodie_source.order.customer_order.customer_items.order_item_detail","fiel= ds":[{"name":"currency_code","type":["string","null"]},{"name":"units","typ= e":["long","null"]},{"name":"nanos","type":["int","null"]},{"name":"exist",= "type":["boolean","null"]}]},"null"]},{"name":"quantity","type":["int","nul= l"]},{"name":"note","type":["string","null"]},{"name":"customer_item_id","t= ype":["string","null"]},{"name":"menu_customer_item_id","type":["string","n= ull"]},{"name":"entity_path","type":[{"type":"record","name":"entity_path",= "namespace":"hoodie.source.hoodie_source.order.customer_order.customer_item= s.order_item_detail","fields":[{"name":"path_nodes","type":[{"type":"array"= ,"items":{"type":"record","name":"path_nodes","namespace":"hoodie.source.ho= odie_source.order.customer_order.customer_items.order_item_detail.entity_pa= th","fields":[{"name":"id","type":["string","null"]},{"name":"type","type":= ["string","null"]},{"name":"exist","type":["boolean","null"]}]}},"null"]},{= "name":"exist","type":["boolean","null"]}]},"null"]},{"name":"exist","type"= :["boolean","null"]}]},"null"]: {"external_id": null, "name": "Item 0", "sa= le_price": {"currency_code": "KRW", "units": 900, "nanos": 0, "exist": null= }, "quantity": 1, "note": "Item 0 note", "customer_item_id": "37a49c46-42dd= -4306-8ea5-e542bdfc0b0c", "menu_customer_item_id": "", "entity_path": null,= "exist": null} > =09at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:7= 40) > =09at org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatu= mWriter.java:205) > =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(G= enericDatumWriter.java:123) > =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter= .java:75) > =09at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumW= riter.java:166) > =09at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatum= Writer.java:156) > =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(G= enericDatumWriter.java:118) > =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter= .java:75) > =09at org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumW= riter.java:192) > =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(G= enericDatumWriter.java:120) > =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter= .java:75) > =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(G= enericDatumWriter.java:125) > =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter= .java:75) > =09at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumW= riter.java:166) > =09at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatum= Writer.java:156) > =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(G= enericDatumWriter.java:118) > =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter= .java:75) > =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(G= enericDatumWriter.java:125) > =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter= .java:75) > =09at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumW= riter.java:166) > =09at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatum= Writer.java:156) > =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(G= enericDatumWriter.java:118) > =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter= .java:75) > =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(G= enericDatumWriter.java:125) > =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter= .java:75) > =09at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumW= riter.java:166) > =09at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatum= Writer.java:156) > =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(G= enericDatumWriter.java:118) > =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter= .java:75) > =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter= .java:62) > =09at org.apache.spark.serializer.GenericAvroSerializer.serializeDatum(Ge= nericAvroSerializer.scala:125) > =09at org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvro= Serializer.scala:159) > =09at org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvro= Serializer.scala:47) > =09at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651) > =09at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$Objec= tArraySerializer.write(DefaultArraySerializers.java:361) > =09at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$Objec= tArraySerializer.write(DefaultArraySerializers.java:302) > =09at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651) > =09at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSe= rializer.scala:351) > =09at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:45= 6) > =09at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto= r.java:1149) > =09at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecut= or.java:624) > =09at java.lang.Thread.run(Thread.java:748) > {code} > =C2=A0 > For a problematic pyload: > Integer i =3D union.getIndexNamed(getSchemaName(datum)) > breakes to: > union.getIndexNamed(getSchemaName(datum)) returns null. > getSchemaName(datum) returns: hoodie.source.hoodie_source.order.customer_= items.customer_items.order_item_detail > but union's schema: > {code:java} > {"type":"record","name":"order_item_detail", > "namespace":"hoodie.source.hoodie_source.order.customer_order.customer_it= ems" > {code} > customer_items.customer_items is repeated in the result of getSchemaName. > union.getIndexNamed("hoodie.source.hoodie_source.order.customer_order.cus= tomer_items.order_item_detail") > returns proper index -- This message was sent by Atlassian Jira (v8.3.4#803005)