From dev-return-4138-archive-asf-public=cust-asf.ponee.io@hudi.apache.org Thu Jun 24 04:48:13 2021 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mxout1-ec2-va.apache.org (mxout1-ec2-va.apache.org [3.227.148.255]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id 8786A180636 for ; Thu, 24 Jun 2021 06:48:13 +0200 (CEST) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-ec2-va.apache.org (ASF Mail Server at mxout1-ec2-va.apache.org) with SMTP id C36F542874 for ; Thu, 24 Jun 2021 04:48:12 +0000 (UTC) Received: (qmail 77654 invoked by uid 500); 24 Jun 2021 04:48:11 -0000 Mailing-List: contact dev-help@hudi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hudi.apache.org Delivered-To: mailing list dev@hudi.apache.org Received: (qmail 77642 invoked by uid 99); 24 Jun 2021 04:48:10 -0000 Received: from spamproc1-he-de.apache.org (HELO spamproc1-he-de.apache.org) (116.203.196.100) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jun 2021 04:48:10 +0000 Received: from localhost (localhost [127.0.0.1]) by spamproc1-he-de.apache.org (ASF Mail Server at spamproc1-he-de.apache.org) with ESMTP id CA03B1FF3A1 for ; Thu, 24 Jun 2021 04:48:09 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamproc1-he-de.apache.org X-Spam-Flag: NO X-Spam-Score: 0.249 X-Spam-Level: X-Spam-Status: No, score=0.249 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=0.2, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamproc1-he-de.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-he-de.apache.org ([116.203.227.195]) by localhost (spamproc1-he-de.apache.org [116.203.196.100]) (amavisd-new, port 10024) with ESMTP id oQCYKfikSvun for ; Thu, 24 Jun 2021 04:48:09 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a00:1450:4864:20::12e; helo=mail-lf1-x12e.google.com; envelope-from=susudong5@gmail.com; receiver= Received: from mail-lf1-x12e.google.com (mail-lf1-x12e.google.com [IPv6:2a00:1450:4864:20::12e]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id 2BD177FFEE for ; Thu, 24 Jun 2021 04:48:09 +0000 (UTC) Received: by mail-lf1-x12e.google.com with SMTP id r5so8019649lfr.5 for ; Wed, 23 Jun 2021 21:48:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=b7WxF9bI0iraArFNEuz4RpYPWbwnysXCPQcl+3S7MoA=; b=sXvJYJwwc5nB9mqYGUnHBPDPGVUp063eUWdr1FtIOCSSnFP7vGHPQ/iNWY3UEoUORw SWe2mAKPuNNm3QrOISyU64qTQs9WnrfkSUMsXqtSmnZdRT6xHQMf5D9zbTcKo5hzDpqr po2/qv2hIkNO1jJwdHt26ea5YX801afGWpwvFxnryayYQ10M0kdFJlgeCwv9l8iFZH8i gsRNtuteqLHhce+Qs0lyfoeNHCtmFSkKI2WPLr8o+Tj8JM41rFmuAbpPXM45rcgXzU2I 11+6BgarwXGSujA+P9fB/8D/4d0txaQUo7pE+kr8cPJNxow65g4fCGxQpIcaVyISIOvM OEvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=b7WxF9bI0iraArFNEuz4RpYPWbwnysXCPQcl+3S7MoA=; b=hWj9ilpoYYqp+g4fWqDtEU61hMAy//uci1XZBnQuqm3sKcuzfuidtmYB0XlJ9XU0sj w0RBpB/Xe5WGdKu/igYPWuH2Tj0PJWieDS6yG0Z0yOVHK7zmz0PY69sNMjCkTm3L3PlJ K0Sjv5MPBCLHD8ecDRthyFHi40o+Fb2t2TbH4o0qfW/UYVEqz4RrRrzQ0z/YbpeGxTLE AtWvi1GVY/a3tfkF/Ufx4EB+XOWU0BMK+bsZS2rotMK9YlA1Vq6vCz4LR70ZdoDaJeaA OTFrkZA6wtMrrRnOPt4zMIewJq8v2aq9W6uJ86YG7I2tv0O9c1wo7hSKiStUXeaeqBfC xLMg== X-Gm-Message-State: AOAM533xt5kkKg3xdB72/EOPUis//sDgDjuu+RKZboZ0NJzemIm7tR9k 01FPxRZoKOBA3OgynjAXdcZbCvbwmRSxsh/opH/KuCmIT/Y= X-Google-Smtp-Source: ABdhPJw8+D/SyuoATywciEo8O3BCu5Mvbl747nSOWPKdWGRXBSCD/Bx7qiD865kqnI2z2bmGw5aIb96KFUUb+7X0jJk= X-Received: by 2002:a19:df44:: with SMTP id q4mr2409315lfj.34.1624510088200; Wed, 23 Jun 2021 21:48:08 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Susu Dong Date: Thu, 24 Jun 2021 13:47:56 +0900 Message-ID: Subject: Re: issue while reading archived commit written by 0.5 version with 0.8 version To: dev Content-Type: multipart/alternative; boundary="000000000000bdc36205c57bba49" --000000000000bdc36205c57bba49 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Aakash, Deleting the old commit files should not impose much of an impact since you are unlikely to use them again once it's been archived successfully, which you have also deleted some of the archived files yourself. =F0=9F=98=85 However, I went back and dug the codebase again. A fix has been merged into the master recently and is supposed to come out in 0.9.0, which should be a better fix to this problem rather than manual intervention. Specifically, you can take a look at this fix here https://github.com/apache/hudi/pull/2677, if you are interested. We will be *skipping* the deserialization of inflight commit files and *only* deserialize complete commit files. As you can see, your problem is caused by archiving 20200715192915.rollback.inflight, which is an inflight commit file. We aren't particularly interested in the content of those inflight files; thus, we have decided to modify the archival logic this way. Failure to archive the commit files should not impede your usage of Hudi, and it could continue to function properly. However, if you do care about a clean running status of your pipeline, feel free to build your 0.9.0 SNAPSHOT version and blend it in. Hope it helps. :) Best, Susu On Thu, Jun 24, 2021 at 12:32 AM aakash aakash wrote: > Hi Susu, > > thanks for the response. Can you please explain whats the impact of > deleting these commit files? > > Thanks! > > On Wed, Jun 23, 2021 at 8:09 AM Susu Dong wrote: > > > Hi Aakash, > > > > I believe there were schema level changes from Hudi 0.5.0 to 0.6.0 > > regarding those commit files. So if you are jumping from 0.5.0 to 0.8.0 > > right away, you will likely experience such an error, i.e. Failed to > > archive commits. You shouldn't need to delete archived files; instead, > you > > should try deleting some, if not all, active commit files under your > > *.hoodie* folder. The reason for that is 0.8.0 is using a new AVRO sche= ma > > to parse your old commit files, so you got the failure. Can you try the > > above approach and let us know? Thank you. :) > > > > Best, > > Susu > > > > On Wed, Jun 23, 2021 at 12:21 PM aakash aakash > > wrote: > > > > > Hi, > > > > > > I am trying to use Hudi 0.8 with Spark 3.0 in my prod environment and > > > earlier we were running Hudi 0.5 with Spark 2.4.4. > > > > > > While updating a very old index, I am getting this error : > > > > > > *from the logs it seem its error out while reading this file : > > > hudi/.hoodie/archived/.commits_.archive.119_1-0-1 in s3* > > > > > > 21/06/22 19:18:06 ERROR HoodieTimelineArchiveLog: Failed to archive > > > commits, .commit file: 20200715192915.rollback.inflight > > > java.io.IOException: Not an Avro data file > > > at > org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:50) > > > at > > > > > > > > > org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAv= roMetadata(TimelineMetadataUtils.java:175) > > > at > > > > > > > > > org.apache.hudi.client.utils.MetadataConversionUtils.createMetaWrapper(Me= tadataConversionUtils.java:84) > > > at > > > > > > > > > org.apache.hudi.table.HoodieTimelineArchiveLog.convertToAvroRecord(Hoodie= TimelineArchiveLog.java:370) > > > at > > > > > > > > > org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArch= iveLog.java:311) > > > at > > > > > > > > > org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTi= melineArchiveLog.java:128) > > > at > > > > > > > > > org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodi= eWriteClient.java:430) > > > at > > > > > > > > > org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHood= ieWriteClient.java:186) > > > at > > > > > > > > > org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.jav= a:121) > > > at > > > > > > > > > org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(Hood= ieSparkSqlWriter.scala:479) > > > > > > > > > Is this a backward compatibility issue? I have deleted a few archive > > files > > > but the problem is persisting so it does not look like a file > corruption > > > issue. > > > > > > Regards, > > > Aakash > > > > > > --000000000000bdc36205c57bba49--