avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Stein <Dev.Stein....@Dev.Bwater.Com>
Subject Asymmetric handling of aliases
Date Thu, 18 May 2017 12:03:48 GMT
Hi, so have a case where we have

data set 1 with schema and a field - { "name": "x", "type" : "string" }
we have app1 and it does .get("x") generic retrieval
This application becomes long lived and we don't want (maybe can't) change it.

We want to change the name of the field. Lets say our new field name is "y" ... according
to docs/specs we are supposed to add that to aliases... A new producer can create data referencing
the improved name “y” and an old consumer can go on thinking in terms of a “x” without
having to do any work.

The problem is the world changes and really the context of that field name should be "y" and
not "x". We want to-do this because the context of the schema should make sense and context
for current state is important. e.g. we used to call it "horse_drawn_carriage" and now we
want to call it "automobile" (pda->mobile_device (lots of things change over time in context)
... there are lots of real world examples that I don't/can't want to get into the weeds about
hopefully my two random ones are enough to help illustrate the problem is real...  we also
have cases where over time the name will likely change again so if we kept using the current
approach and add more to aliases you don't know which one of those aliases is really the current
one which is why we favor field name to be current context.

so we do

data set 2 with schema and a field - { "name": "y", "type" : "string", "aliases" :["x"]}
we have app2 and it does .get("y") generic retrieval because that is how folks now know to
build their apps. The problem is.... aliases are not bidirectional. So we can't reference
"x" to get at our data in the old app which breaks :(

So we came up with a patch that handles this ~ roughly ~

public static Object resolveField(GenericRecord genericRecord, String fieldName) {
        for (Schema.Field field : genericRecord.getSchema().getFields()) {
            if (field.name().equals(fieldName)) { return genericRecord.get(fieldName); }

            for (String alias : field.aliases()) {
                if (fieldName.equals(alias)) { return genericRecord.get(field.name()); }

        return null;

I wanted to check first if we were missing something as we were going through this or doing
something by changing alias in a way that the community believes is at odds with some principles
we were not understanding or properly grocking? I am very open minded that we have gone down
the wrong path here however it does seem to solve the core problem we have with keeping context
of the schema current. I could see how this problem is not just us or our use case and one
that others have too.

If folks are in sync with this change I would like to propose/create a patch and see about
making aliases work bi-directionally allowing folks to use the name field as "the current
context of the name of the thing" where the list of aliases are historic items.



~ Joe Stein

View raw message