avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russell Jurney <russell.jur...@gmail.com>
Subject Re: Avro vs Json
Date Tue, 14 Aug 2012 00:31:53 GMT
This is consistent with my experience. As a user of HDFS, I would find data
produced by others and not know the semantics well enough to use it. On
board schemas, with comments, make this data more useable, although a
system like HCatalog is useful in facilitating this kind of discovery.

Avro enables and encourages the preparation of shared data sets among
users, which saves cycles and improves productivity.

Russell Jurney http://datasyndrome.com

On Aug 13, 2012, at 4:00 PM, Bill Graham <billgraham@gmail.com> wrote:

It is worth keeping in mind that explicit external schema is another
> cost in not just designing but also maintaining the system. As such,
> it is most useful for closely-coupled internal system, where one
> controls both ends. This may be the case for computing pipelines a
> single team owns.

Our experiences have been quite the opposite. When the developer producing
data was the same as the developer writing code to consume it, json worked
fine since the developer knew what fields to expect. As our company grew,
this turned into tribal knowledge and the approach did not scale. That's
when having schemas is critical: when one team produces data and many
others consume it. The cost is that the producer needs to publish the
schema for others to discover.

On Mon, Aug 13, 2012 at 10:50 AM, Tatu Saloranta <tsaloranta@gmail.com>wrote:

> On Sun, Aug 12, 2012 at 8:03 PM, Russell Jurney
> <russell.jurney@gmail.com> wrote:
> > To be fair, you can test types as you parse JSON. But only a few.
> ...
> Difference between external/explicit schema typed formats and
> schema-free (optional schema, as in JSON) formats is similar to that
> between statically and dynamically typed languages.
> Testing and handling differ, as well as trade-offs.
> -+ Tatu +-

View raw message