From commits-return-13608-archive-asf-public=cust-asf.ponee.io@hudi.apache.org Wed Mar 18 20:06:02 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 2961718025F for ; Wed, 18 Mar 2020 21:06:02 +0100 (CET) Received: (qmail 30275 invoked by uid 500); 18 Mar 2020 20:06:01 -0000 Mailing-List: contact commits-help@hudi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hudi.apache.org Delivered-To: mailing list commits@hudi.apache.org Received: (qmail 30266 invoked by uid 99); 18 Mar 2020 20:06:01 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Mar 2020 20:06:01 +0000 Received: from jira-he-de.apache.org (static.172.67.40.188.clients.your-server.de [188.40.67.172]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 9731FE044E for ; Wed, 18 Mar 2020 20:06:00 +0000 (UTC) Received: from jira-he-de.apache.org (localhost.localdomain [127.0.0.1]) by jira-he-de.apache.org (ASF Mail Server at jira-he-de.apache.org) with ESMTP id 0724F780269 for ; Wed, 18 Mar 2020 20:06:00 +0000 (UTC) Date: Wed, 18 Mar 2020 20:06:00 +0000 (UTC) From: "Alexander Filipchik (Jira)" To: commits@hudi.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (HUDI-721) AvroConversionUtils is broken for complex types in 0.6 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Alexander Filipchik created HUDI-721: ---------------------------------------- Summary: AvroConversionUtils is broken for complex types in 0.= 6 Key: HUDI-721 URL: https://issues.apache.org/jira/browse/HUDI-721 Project: Apache Hudi (incubating) Issue Type: Bug Reporter: Alexander Filipchik hi, was working on the upgrade from 0.5 to 0.6 and hit a bug in AvroConversionU= tils. I originally blames it on Spark parquet to avro schema generator (con= vertStructTypeToAvroSchema method), but after some debugging I'm pretty sur= e the issue is somewhere in the: AvroConversionHelper. =C2=A0 What happens: when complexes type is extracted using SqlTransformer (using = select bla fro ) where bla is complex type with arrays of struct, Kryo= serialization breaks with : =C2=A0 {code:java} 28701 [dag-scheduler-event-loop] INFO org.apache.spark.scheduler.DAGSchedu= ler - ResultStage 1 (isEmpty at DeltaSync.java:337) failed in 12.146 s due= to Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, m= ost recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor = driver): org.apache.avro.UnresolvedUnionException: Not in union=20 =09at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:740= ) =09at org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumW= riter.java:205) =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(Gen= ericDatumWriter.java:123) =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.j= ava:75) =09at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWri= ter.java:166) =09at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWr= iter.java:156) =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(Gen= ericDatumWriter.java:118) =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.j= ava:75) =09at org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWri= ter.java:192) =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(Gen= ericDatumWriter.java:120) =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.j= ava:75) =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(Gen= ericDatumWriter.java:125) =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.j= ava:75) =09at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWri= ter.java:166) =09at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWr= iter.java:156) =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(Gen= ericDatumWriter.java:118) =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.j= ava:75) =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(Gen= ericDatumWriter.java:125) =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.j= ava:75) =09at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWri= ter.java:166) =09at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWr= iter.java:156) =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(Gen= ericDatumWriter.java:118) =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.j= ava:75) =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(Gen= ericDatumWriter.java:125) =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.j= ava:75) =09at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWri= ter.java:166) =09at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWr= iter.java:156) =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(Gen= ericDatumWriter.java:118) =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.j= ava:75) =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.j= ava:62) =09at org.apache.spark.serializer.GenericAvroSerializer.serializeDatum(Gene= ricAvroSerializer.scala:125) =09at org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvroSe= rializer.scala:159) =09at org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvroSe= rializer.scala:47) =09at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651) =09at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectA= rraySerializer.write(DefaultArraySerializers.java:361) =09at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectA= rraySerializer.write(DefaultArraySerializers.java:302) =09at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651) =09at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSeri= alizer.scala:351) =09at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:456) =09at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.= java:1149) =09at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor= .java:624) =09at java.lang.Thread.run(Thread.java:748)Driver stacktrace: 28702 [main] INFO org.apache.spark.scheduler.DAGScheduler - Job 1 failed:= isEmpty at DeltaSync.java:337, took 12.149897 s 28702 [main] ERROR org.apache.hudi.utilities.deltastreamer.HoodieDeltaStrea= mer - Got error running delta sync once. Shutting down org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 i= n stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0= (TID 1, localhost, executor driver): org.apache.avro.UnresolvedUnionExcept= ion: Not in union [{"type":"record","name":"order_item_detail","namespace":= "hoodie.source.hoodie_source.order.customer_order.customer_items","fields":= [{"name":"external_id","type":[{"type":"record","name":"external_id","names= pace":"hoodie.source.hoodie_source.order.customer_order.customer_items.orde= r_item_detail","fields":[{"name":"id","type":["string","null"]},{"name":"di= splay_id","type":["string","null"]},{"name":"exist","type":["boolean","null= "]}]},"null"]},{"name":"name","type":["string","null"]},{"name":"sale_price= ","type":[{"type":"record","name":"sale_price","namespace":"hoodie.source.h= oodie_source.order.customer_order.customer_items.order_item_detail","fields= ":[{"name":"currency_code","type":["string","null"]},{"name":"units","type"= :["long","null"]},{"name":"nanos","type":["int","null"]},{"name":"exist","t= ype":["boolean","null"]}]},"null"]},{"name":"quantity","type":["int","null"= ]},{"name":"note","type":["string","null"]},{"name":"customer_item_id","typ= e":["string","null"]},{"name":"menu_customer_item_id","type":["string","nul= l"]},{"name":"entity_path","type":[{"type":"record","name":"entity_path","n= amespace":"hoodie.source.hoodie_source.order.customer_order.customer_items.= order_item_detail","fields":[{"name":"path_nodes","type":[{"type":"array","= items":{"type":"record","name":"path_nodes","namespace":"hoodie.source.hood= ie_source.order.customer_order.customer_items.order_item_detail.entity_path= ","fields":[{"name":"id","type":["string","null"]},{"name":"type","type":["= string","null"]},{"name":"exist","type":["boolean","null"]}]}},"null"]},{"n= ame":"exist","type":["boolean","null"]}]},"null"]},{"name":"exist","type":[= "boolean","null"]}]},"null"]: {"external_id": null, "name": "Item 0", "sale= _price": {"currency_code": "KRW", "units": 900, "nanos": 0, "exist": null},= "quantity": 1, "note": "Item 0 note", "customer_item_id": "37a49c46-42dd-4= 306-8ea5-e542bdfc0b0c", "menu_customer_item_id": "", "entity_path": null, "= exist": null} =09at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:740= ) =09at org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumW= riter.java:205) =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(Gen= ericDatumWriter.java:123) =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.j= ava:75) =09at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWri= ter.java:166) =09at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWr= iter.java:156) =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(Gen= ericDatumWriter.java:118) =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.j= ava:75) =09at org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWri= ter.java:192) =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(Gen= ericDatumWriter.java:120) =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.j= ava:75) =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(Gen= ericDatumWriter.java:125) =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.j= ava:75) =09at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWri= ter.java:166) =09at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWr= iter.java:156) =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(Gen= ericDatumWriter.java:118) =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.j= ava:75) =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(Gen= ericDatumWriter.java:125) =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.j= ava:75) =09at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWri= ter.java:166) =09at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWr= iter.java:156) =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(Gen= ericDatumWriter.java:118) =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.j= ava:75) =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(Gen= ericDatumWriter.java:125) =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.j= ava:75) =09at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWri= ter.java:166) =09at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWr= iter.java:156) =09at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(Gen= ericDatumWriter.java:118) =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.j= ava:75) =09at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.j= ava:62) =09at org.apache.spark.serializer.GenericAvroSerializer.serializeDatum(Gene= ricAvroSerializer.scala:125) =09at org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvroSe= rializer.scala:159) =09at org.apache.spark.serializer.GenericAvroSerializer.write(GenericAvroSe= rializer.scala:47) =09at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651) =09at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectA= rraySerializer.write(DefaultArraySerializers.java:361) =09at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectA= rraySerializer.write(DefaultArraySerializers.java:302) =09at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651) =09at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSeri= alizer.scala:351) =09at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:456) =09at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.= java:1149) =09at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor= .java:624) =09at java.lang.Thread.run(Thread.java:748) {code} =C2=A0 For a problematic pyload: Integer i =3D union.getIndexNamed(getSchemaName(datum)) breakes to: union.getIndexNamed(getSchemaName(datum)) returns null. getSchemaName(datum) returns: hoodie.source.hoodie_source.order.customer_it= ems.customer_items.order_item_detail but union's schema: {code:java} {"type":"record","name":"order_item_detail", "namespace":"hoodie.source.hoodie_source.order.customer_order.customer_item= s" {code} customer_items.customer_items is repeated in the result of getSchemaName. union.getIndexNamed("hoodie.source.hoodie_source.order.customer_order.custo= mer_items.order_item_detail") returns proper index -- This message was sent by Atlassian Jira (v8.3.4#803005)