hudi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From nishith agarwal <n3.nas...@gmail.com>
Subject Re: Schema compatibility
Date Tue, 25 Jun 2019 18:11:16 GMT
Hi Katie,

Thanks for explaining the problem in detail. Could you give us some more
information before I can help you with this ?

1. What table type are you using - COPY_ON_WRITE or MERGE_ON_READ ?
2. Could you paste the exception you see in Hudi ?
3. "Despite the schema having full compatibility" -> Can you explain what
you mean by "full compatibility" ?

Thanks,
Nishith

On Tue, Jun 25, 2019 at 10:32 AM Katie Frost <katiesfrost95@gmail.com>
wrote:

> Hi,
>
> I've been using the hudi delta streamer to create datasets in S3 and i've
> had issues with hudi acknowledging schema compatibility.
>
> I'm trying to run a spark job ingesting avro data to a hudi dataset in s3,
> with the raw avro source data also stored in s3. The raw avro data has two
> different schema versions, and I have supplied the job with the latest
> schema. However the job fails to ingest any of the data that is not up to
> date with the latest schema and ingests only the data matching the given
> schema, despite the schema having full compatibility. Is this a known
> issue? or just a case of missing some configuration?
>
> The error I get when running the job to ingest the data not up to date with
> latest avro schema is an array index out of bounds exception, and I know it
> is a schema issue as I have tested running the job with the older schema
> version, removing any data that matches the latest schema, and the job runs
> successfully.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message