avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joseph Pachod <joseph.pac...@resurgences.com>
Subject Re: Specify non-empty array, map, etc.
Date Fri, 12 May 2017 08:00:05 GMT
Actually, regarding confluent schema registry, I'm not sure I get the point
: props are a valid part of a schema and are stored in the schema registry
just  fine.

So matter is more whether you're sure to be always be in the way of the
avro binary generation. Personally I'm always cautious regarding third
parties, so Avro is behind some wrapper and thus we are sure of being in
the way.

2017-05-12 8:24 GMT+02:00 Tianxiang Xiong <tianxiang.xiong@fundingcircle.com
>:

> @Andy Adding these constraints at the schema level prevents bad data from
> making it onto Kafka topics in the first place, preventing data pollution.
> I don't know what you mean by "making it harder to write data using that
> schema"--imposing and enforcing constraints is kind of the point.
>
> > Why not just handle the empty case where you consume the data?
>
> That's what we currently do, but we wouldn't have to have this extra test
> case if we could impose the aforementioned constraint at the schema level.
>
> Right now, we treat messages with an empty array as erroneous, and output
> a corresponding message onto an error topic. If we reset our application
> and consumed messages again, we'd be putting new messages onto the error
> topic, *doubling* the unwanted data.
>
> @Joseph That's an interesting approach. I know that Avro is extensible,
> but we're relying on some third-party serde classes, and as @Andy mentions,
> once you start getting into the weeds all bets are off.
>
> On 11 May 2017 at 09:48, Andy Chambers <achambers.home@gmail.com> wrote:
>
>> I think the question you need to ask/answer is what is there to gain by
>> adding this constraint. (This goes for any writer constraint)
>>
>> Each constraint you add makes it harder to write data using that schema.
>>
>> Why not just handle the empty case where you consume the data?
>>
>> Once you start adding custom datum writers, all bets are off with respect
>> to schema compatibility so if you're using/trusting something like the
>> confluent schema registry you're in trouble.
>>
>> On 11 May 2017 4:35 pm, "Joseph P." <joseph.pachod@gmail.com> wrote:
>>
>> Hi
>>
>> You can add prop to your avro schema.
>>
>> So here we have added our custo props and extra processing before
>> generating the avro binary to make sure these props are respected.
>>
>> Pro : very flexible (we have added max_length on string, temporal_format
>> and so forth...).
>> Cons : you must be sure to have your extra processing running before
>> generating the avro binaries
>>
>> For example in your case you could add a prop "nonEmpty" with default
>> value to false.
>>
>> Then, before converting the Avro Json/Pojo to Avro binary, you use your
>> own SpecificDatumWriter (extending SpecificDatumWriter) and then in
>> writeField you check for the presence of the prop, its value, and if true
>> you check for non emptiness.
>>
>> Cheers
>>
>>
>> On Wed, May 10, 2017 at 10:41 AM, Tianxiang Xiong <
>> tianxiang.xiong@fundingcircle.com> wrote:
>>
>>> Thanks Suraj, but that's not what I mean.
>>>
>>> For your second schema, it is possible to pass in an empty array `[]`
>>> containing no elements. I would like to prevent that.
>>>
>>> On 8 May 2017 at 19:32, Suraj Acharya <suraj@apache.org> wrote:
>>>
>>>> This is what I have done in my application :
>>>>
>>>> {"name": "clients", "type": [ {"type": "array", "items": "Client"}, "null"
]}
>>>>
>>>> This allows me to pass null. What you can try is something like this :
>>>>
>>>> {"name": "info", "type": { "type": "array", "items": "Information" }
>>>>
>>>> In this example, info is something that needs to be passed for every
>>>> client.
>>>>
>>>> Hope that helps.
>>>>
>>>>
>>>> On Fri, May 5, 2017 at 9:51 PM, Tianxiang Xiong <
>>>> tianxiang.xiong@fundingcircle.com> wrote:
>>>>
>>>>> In Avro 1.7.7, is there a way to specify a *non-empty* array, map,
>>>>> etc.? There doesn't seem to be according to the spec
>>>>> <https://avro.apache.org/docs/1.7.7/spec.html#Maps>.
>>>>>
>>>>> There are applications in which we mandate that a data format has a
>>>>> non-empty array. It'd be nice if that could be expressed in the schema
so
>>>>> data with nonempty arrays fail to serialize (and are thus never put on
a
>>>>> Kafka topic). Fail earlier > fail later.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> TX
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> *Tianxiang Xiong*
>>>
>>> *tianxiang.xiong@fundingcircle.com <tianxiang.xiong@fundingcircle.com>*
>>>
>>> 747 Front Street, Floor 4 | San Francisco, CA 94111
>>>
>>
>>
>>
>
>
> --
>
> *Tianxiang Xiong*
>
> *tianxiang.xiong@fundingcircle.com <tianxiang.xiong@fundingcircle.com>*
>
> 747 Front Street, Floor 4 | San Francisco, CA 94111
>



-- 

[image: Image1]

Joseph PACHOD
Architecte logiciel

*joseph.pachod@berger-levrault.com <joseph.pachod@berger-levrault.com>*

[image: Image002]  0811 696 386

*www.berger-levrault.com* <http://www.berger-levrault.com/>

[image: boutique1] <http://boutique.berger-levrault.fr/>  [image: youtube1]
<https://www.youtube.com/channel/UCpBKKOUeuDAQhSpLTqBMaSA>  [image:
twitter1] <https://twitter.com/bergerlevrault>  [image: linkedin1]
<https://fr.linkedin.com/company/berger-levrault>

Mime
View raw message