hudi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Susu Dong <susudo...@gmail.com>
Subject Re: issue while reading archived commit written by 0.5 version with 0.8 version
Date Thu, 24 Jun 2021 04:47:56 GMT
Hi Aakash,

Deleting the old commit files should not impose much of an impact since you
are unlikely to use them again once it's been archived successfully, which
you have also deleted some of the archived files yourself. 😅

However, I went back and dug the codebase again. A fix has been merged into
the master recently and is supposed to come out in 0.9.0, which should be a
better fix to this problem rather than manual intervention.
Specifically, you can take a look at this fix here
https://github.com/apache/hudi/pull/2677, if you are interested.
We will be *skipping* the deserialization of inflight commit files and
*only* deserialize complete commit files. As you can see, your problem is
caused by archiving 20200715192915.rollback.inflight, which is an inflight
commit file. We aren't particularly interested in the content of those
inflight files; thus, we have decided to modify the archival logic this
way.

Failure to archive the commit files should not impede your usage of Hudi,
and it could continue to function properly. However, if you do care about a
clean running status of your pipeline, feel free to build your 0.9.0
SNAPSHOT version and blend it in. Hope it helps. :)

Best,
Susu


On Thu, Jun 24, 2021 at 12:32 AM aakash aakash <email2aakash@gmail.com>
wrote:

> Hi Susu,
>
> thanks for the response. Can you please explain whats the impact of
> deleting these commit files?
>
> Thanks!
>
> On Wed, Jun 23, 2021 at 8:09 AM Susu Dong <susudong5@gmail.com> wrote:
>
> > Hi Aakash,
> >
> > I believe there were schema level changes from Hudi 0.5.0 to 0.6.0
> > regarding those commit files. So if you are jumping from 0.5.0 to 0.8.0
> > right away, you will likely experience such an error, i.e. Failed to
> > archive commits. You shouldn't need to delete archived files; instead,
> you
> > should try deleting some, if not all, active commit files under your
> > *.hoodie* folder. The reason for that is 0.8.0 is using a new AVRO schema
> > to parse your old commit files, so you got the failure. Can you try the
> > above approach and let us know? Thank you. :)
> >
> > Best,
> > Susu
> >
> > On Wed, Jun 23, 2021 at 12:21 PM aakash aakash <email2aakash@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I am trying to use Hudi 0.8 with Spark 3.0 in my prod environment and
> > > earlier we were running Hudi 0.5 with Spark 2.4.4.
> > >
> > > While updating a very old index, I am getting this error :
> > >
> > > *from the logs it seem its  error out while reading this file :
> > > hudi/.hoodie/archived/.commits_.archive.119_1-0-1 in s3*
> > >
> > > 21/06/22 19:18:06 ERROR HoodieTimelineArchiveLog: Failed to archive
> > > commits, .commit file: 20200715192915.rollback.inflight
> > > java.io.IOException: Not an Avro data file
> > > at
> org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:50)
> > > at
> > >
> > >
> >
> org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:175)
> > > at
> > >
> > >
> >
> org.apache.hudi.client.utils.MetadataConversionUtils.createMetaWrapper(MetadataConversionUtils.java:84)
> > > at
> > >
> > >
> >
> org.apache.hudi.table.HoodieTimelineArchiveLog.convertToAvroRecord(HoodieTimelineArchiveLog.java:370)
> > > at
> > >
> > >
> >
> org.apache.hudi.table.HoodieTimelineArchiveLog.archive(HoodieTimelineArchiveLog.java:311)
> > > at
> > >
> > >
> >
> org.apache.hudi.table.HoodieTimelineArchiveLog.archiveIfRequired(HoodieTimelineArchiveLog.java:128)
> > > at
> > >
> > >
> >
> org.apache.hudi.client.AbstractHoodieWriteClient.postCommit(AbstractHoodieWriteClient.java:430)
> > > at
> > >
> > >
> >
> org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:186)
> > > at
> > >
> > >
> >
> org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:121)
> > > at
> > >
> > >
> >
> org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:479)
> > >
> > >
> > > Is this a backward compatibility issue? I have deleted a few archive
> > files
> > > but the problem is persisting so it does not look like a file
> corruption
> > > issue.
> > >
> > > Regards,
> > > Aakash
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message