avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Carey <sc...@richrelevance.com>
Subject Re: Promote to string
Date Thu, 10 Feb 2011 18:28:09 GMT


On 2/4/11 1:16 PM, "Philip Zeyliger" <philip@cloudera.com> wrote:

>On Fri, Feb 4, 2011 at 10:02 AM, Scott Carey
><scott@richrelevance.com>wrote:
>
>> I have been thinking about more advanced type promotion in Avro after
>> facing more complicated schema evolution issues.
>
>
>My two cents:
>
>This way lies madness.  Avro (and PB and Thrift) give you some basic tools
>to evolve an API without doing much extra code.  At some point, you end up
>forking and creating an APIv2, and eventually deprecate APIv1.  If you try
>to make that magical, you'll end up building a programming language.

I agree that protocol API versus AVIv2 is an example where exotic
conversions don't make a lot of sense.  The schemas in a protocol API
isn't persisted long term, it is only on the wire.

My use cases are in long term persisted file data, where schema evolution
spans a much longer time window (forever unless I can re-write all data).
Having  File format v1 not being compatible with file format v2 is a lot
harder to swallow than API v2 not being compatible with API v2.

I have another use case in mind as well.  Schema transformation is a
common need for interoperation with other frameworks. Cascading doesn't
support nested records (or it didn't last I looked), so a Cascading Tap
has to either flatten them or not support them.  Pig doesn't support
unions, so they are either not supported, or manipulated into non-union
structures.  Schema transformation is a common use case when integrating
Avro with pre-existing systems.
When working on Pig and Hive adapter prototypes, there turned out to be a
lot of overlap and repeated work -- and its almost all in schema
transformation (flattening, unions, etc), classification (recursive?), and
translation.
If there was a general helper library for this sort of work, then the
remaining adapter would be rather small and not require so much Avro
domain knowledge.


>
>By all means define a language that converts from one Avro record into
>another.  An Avro expression language would be quite useful, actually.
>Putting it in the core, however, strikes me as feature creep.

Core should definitely remain simple.  Anything like this should be an
optional library.  Support for each transformation should be optional as
well -- many languages might have string <> int, while only a couple have
union branch materialization.

The more complicated transforms are mostly useful for frameworks that want
to use Avro in a way that can interop with other frameworks using avro.

The initial reaction to the above statement is probably, "If they are both
using Avro already, shouldn't they automatically be able to share data?"
The answer is no.  They aren't using Avro as their internal schema system.
 They are _translating_ between their internal schema system and Avro,
potentially applying various transformation rules.  So, for the lowest
common denominator supported schemas, it works fine, anything more
complicated and it won't.  This is not a fault of Avro, it is the nature
of compatibility between two non-Avro schema systems.
Hive supports Maps with integers as keys.  Pig does not.  These can be
made to interop through Avro if both systems share their schema
translation techniques, but not otherwise.

>
>-- Philip


Mime
View raw message