avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Chambers <achambers.h...@gmail.com>
Subject Re: Specify non-empty array, map, etc.
Date Thu, 11 May 2017 16:48:19 GMT
I think the question you need to ask/answer is what is there to gain by
adding this constraint. (This goes for any writer constraint)

Each constraint you add makes it harder to write data using that schema.

Why not just handle the empty case where you consume the data?

Once you start adding custom datum writers, all bets are off with respect
to schema compatibility so if you're using/trusting something like the
confluent schema registry you're in trouble.

On 11 May 2017 4:35 pm, "Joseph P." <joseph.pachod@gmail.com> wrote:


You can add prop to your avro schema.

So here we have added our custo props and extra processing before
generating the avro binary to make sure these props are respected.

Pro : very flexible (we have added max_length on string, temporal_format
and so forth...).
Cons : you must be sure to have your extra processing running before
generating the avro binaries

For example in your case you could add a prop "nonEmpty" with default value
to false.

Then, before converting the Avro Json/Pojo to Avro binary, you use your own
SpecificDatumWriter (extending SpecificDatumWriter) and then in writeField
you check for the presence of the prop, its value, and if true you check
for non emptiness.


On Wed, May 10, 2017 at 10:41 AM, Tianxiang Xiong <tianxiang.xiong@
fundingcircle.com> wrote:

> Thanks Suraj, but that's not what I mean.
> For your second schema, it is possible to pass in an empty array `[]`
> containing no elements. I would like to prevent that.
> On 8 May 2017 at 19:32, Suraj Acharya <suraj@apache.org> wrote:
>> This is what I have done in my application :
>> {"name": "clients", "type": [ {"type": "array", "items": "Client"}, "null" ]}
>> This allows me to pass null. What you can try is something like this :
>> {"name": "info", "type": { "type": "array", "items": "Information" }
>> In this example, info is something that needs to be passed for every
>> client.
>> Hope that helps.
>> On Fri, May 5, 2017 at 9:51 PM, Tianxiang Xiong <
>> tianxiang.xiong@fundingcircle.com> wrote:
>>> In Avro 1.7.7, is there a way to specify a *non-empty* array, map,
>>> etc.? There doesn't seem to be according to the spec
>>> <https://avro.apache.org/docs/1.7.7/spec.html#Maps>.
>>> There are applications in which we mandate that a data format has a
>>> non-empty array. It'd be nice if that could be expressed in the schema so
>>> data with nonempty arrays fail to serialize (and are thus never put on a
>>> Kafka topic). Fail earlier > fail later.
>>> Thanks,
>>> TX
> --
> *Tianxiang Xiong*
> *tianxiang.xiong@fundingcircle.com <tianxiang.xiong@fundingcircle.com>*
> 747 Front Street, Floor 4 | San Francisco, CA 94111

View raw message