Mailing-List: contact user-help@avro.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@avro.apache.org
Received-SPF: pass (athena.apache.org: domain of busbey@cloudera.com
 designates 209.85.216.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <CA2C0ADEC9BF4F6B898A33D79829C6C7@gmail.com>
References: <EC1924A8968440E1B54B3F45E31E1C66@gmail.com>
 <086FD9C0ED6A46C297C861B1369617E5@gmail.com>
 <CAGHyZ6KhssPCq=jCoDRHKbCzaAzvMVY2UMWLu=HCXVPprkS-0g@mail.gmail.com>
 <CA2C0ADEC9BF4F6B898A33D79829C6C7@gmail.com>
From: Sean Busbey <busbey@cloudera.com>
Date: Tue, 3 Feb 2015 12:13:05 -0600
Message-ID: 
 <CAGHyZ6KydR-fmXShRthHNunsccc_sC0ZzhGSWTZxboUtKsqSvg@mail.gmail.com>
Subject: Re: Adding new field with default value to an Avro schema
To: "user@avro apache. org" <user@avro.apache.org>
Content-Type: multipart/alternative; boundary=047d7bdc19ace6eed8050e3306f3

--047d7bdc19ace6eed8050e3306f3
Content-Type: text/plain; charset=UTF-8

On Tue, Feb 3, 2015 at 11:01 AM, Burak Emre <emrekabakci@gmail.com> wrote:

> @Sean thanks for the explanation.
>
> I have multiple writers but only one reader and the only schema migration
> operation is adding a new field so I thought that I may use the same schema
> for all dataset since the ordering will be same in all of them even though
> some may contain extra fields which is also defined in schema definition.
>
> Actually I wanted to avoid using an external database for sequential
> schema ids since it would make the system more complex than it should be in
> my case but it seems this is the only option for now.
>
>
>

An external database isn't strictly required. The only important bit is
that each schema have a unique immutable identifier. As Doug mentioned, you
could do this as an enum of schemas in your source code (so long as you
handled updates in reader-then-writer order). Similarly, you could do it by
relying on schema fingerprints and just loading avsc files out of shared
storage.

-- 
Sean

--047d7bdc19ace6eed8050e3306f3
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote">On T=
ue, Feb 3, 2015 at 11:01 AM, Burak Emre <span dir=3D"ltr">&lt;<a href=3D"ma=
ilto:emrekabakci@gmail.com" target=3D"_blank">emrekabakci@gmail.com</a>&gt;=
</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .=
8ex;border-left:1px #ccc solid;padding-left:1ex">
                <div style=3D"font-family:Helvetica;font-size:13px">@Sean t=
hanks for the explanation.<div><br><div>I have multiple writers but only on=
e reader and the only schema migration operation is adding a new field so I=
 thought that I may use the same schema for all dataset since the ordering =
will be same in all of them even though some may contain extra fields which=
 is also defined in schema definition.</div><div><br></div><div>Actually I =
wanted to avoid using an external database for sequential schema ids since =
it would make the system more complex than it should be in my case but it s=
eems this is the only option for now.</div></div></div><span class=3D"HOEnZ=
b"><font color=3D"#888888">
                <div><div><br></div><div><br></div></div></font></span></bl=
ockquote><div><br></div><div><br></div><div>An external database isn&#39;t =
strictly required. The only important bit is that each schema have a unique=
 immutable identifier. As Doug mentioned, you could do this as an enum of s=
chemas in your source code (so long as you handled updates in reader-then-w=
riter order). Similarly, you could do it by relying on schema fingerprints =
and just loading avsc files out of shared storage.</div></div><div><br></di=
v>-- <br><div class=3D"gmail_signature"><div dir=3D"ltr">Sean</div></div>
</div></div>

--047d7bdc19ace6eed8050e3306f3--