pulsar-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [pulsar] sijie commented on issue #3741: POJO AvroSchema always allowNull
Date Tue, 05 Mar 2019 02:13:00 GMT
sijie commented on issue #3741: POJO AvroSchema always allowNull
URL: https://github.com/apache/pulsar/issues/3741#issuecomment-469506795
   > Given a POJO generated by Avro, there is no way to determine whether this POJO was
generated with a schema that allowed or not for null types.
   > but that's separate from the point that that alone won't guarantee we can generate
the correct schema starting from the generated POJO.
   The issue I am creating here is for `AllowNull`. We found `AllowNull` is a problem from
2 use cases: 1) the one reported by @skyrocknroll  2) the other use case that @codelipenghui
   The whole picture of @skyrocknroll `s problem is  avro file => avro generated pojo.
a schema generated by avro file is not compatible by scheme generated by pulsar.AvroSchema(generated
pojo). One of the problems is `AllowNull` completely changes the schema definitions. Whether
removing `AllowNull` can address this problem or not is a separate issue to be address. Although
I would expect Avro can handle this well. We shouldn't couple the discussion of this issue
with a broad issue introduced by `AllowNull`. 
   > It's a 100% correct solution for that case. I don't see what's limited about that.
   Generated POJO is the use case reported and discussed at slack channel. If you just handle
generated POJO by using `getClassSchema`, you are not covering many other data sources which
generate AVRO schema using ReflectData.
   > the contract you are creating when generating an avro schema using the ReflectData
api is with the java class itself not some other tool or system. 
   I am not creating any contract. They are from real use cases. Also the whole discussion
is around POJO only, it is not even a cross-language issue or any user customized schema issue.
(it was found when being used in cross-language)
   1) if a user uses an AVRO schema file to generate a schema using avro tools (that's call
this schema A), and generate a POJO class. then the user use the POJO class and use pulsar
avro schema to generate another schema B. Ideally A and B should be compatible.
   schema A => generated pojo => (pulsar AvroSchema) => schema B
   schema A => generated pojo => ReflectData.AllowNull.parse => schema B
   `AllowNull` is the problem to prevent them being compatible. I don't know if removing `AllowNull`
can fully address this problem or not. That's a specific issue to address for Avro generated
POJO. I hope Avro provides the right tools to convert back-and-forth between schema and pojos,
otherwise IMO it is a problem of Avro.
   If we removed `AllowNull`, the flow will be changed to following. The scope of the problem
is different - whether `ReflectData` is the right tool to handle Avro generated POJO or does
Avro even provide tools to guarantee such conversations.
   schema A => generated pojo => ReflectData.parse => schema B
   2)  Image I have a data system A (e.g. Spark or Flink) and Pulsar. I have a POJO class
(e.g. UserProfile) defined across the whole organization. The schema generated in different
systems are completely not compatible even they are using same POJO. When data is flowing
between pulsar and other systems, the data might not be processed properly due to incompatible
schema. Pulsar is the message bus for exchanging data between other system. If it produces
an incompatible schema than other systems, IMO that's a very serious bug. 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

With regards,
Apache Git Services

View raw message