Return-Path: X-Original-To: apmail-avro-user-archive@www.apache.org Delivered-To: apmail-avro-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 11768173D4 for ; Tue, 3 Feb 2015 18:21:59 +0000 (UTC) Received: (qmail 60918 invoked by uid 500); 3 Feb 2015 18:14:36 -0000 Delivered-To: apmail-avro-user-archive@avro.apache.org Received: (qmail 60759 invoked by uid 500); 3 Feb 2015 18:14:36 -0000 Mailing-List: contact user-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@avro.apache.org Delivered-To: mailing list user@avro.apache.org Received: (qmail 60687 invoked by uid 99); 3 Feb 2015 18:14:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Feb 2015 18:14:36 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of busbey@cloudera.com designates 209.85.216.172 as permitted sender) Received: from [209.85.216.172] (HELO mail-qc0-f172.google.com) (209.85.216.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Feb 2015 18:14:31 +0000 Received: by mail-qc0-f172.google.com with SMTP id x3so12717015qcv.3 for ; Tue, 03 Feb 2015 10:13:25 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=hgtEWfwEDqyWthxLLzWjnp9KQ7KLLvdHzzlGoA46XWE=; b=BR7wVa3fXEy51O2CO8bNEoik3/IqOaQ5Rqe3Ji5YcKC7xMdoMrPgYj/eA+6quWnibk +iDNF/JqmfsqYJ/7S2WptCfwBtXE6toSdU2XhQbYuLbgjGz2MACr3rV3/xFFm9jWlV4T loudew8cNQQdrPbrgYija9ez0VdZhNxUwWj4LpCEjNOR8sAPvI19epCYoOP8yYyr90qT ThDVg9QMtCHwlBSNI3ELabSYzWqaESCOJ1l+wxJaS9B0p99dlJpW77x4rfl01QdPLIRz Yp0ItZDlRgrGWNqQ4Ef0XpZ7yXM6TpU7UnlDWu6Yz4Fr/Tv1/cVTSYaMux3uL0lyij+R 2/Nw== X-Gm-Message-State: ALoCoQmIhnTGa5MpDjJWFDUoxy2G4+3kxZdw2Y+ELq5v65EkNGAVWNqy2KtEojjwwmyXn14KTo5/ X-Received: by 10.224.114.209 with SMTP id f17mr342915qaq.68.1422987205468; Tue, 03 Feb 2015 10:13:25 -0800 (PST) MIME-Version: 1.0 Received: by 10.229.207.5 with HTTP; Tue, 3 Feb 2015 10:13:05 -0800 (PST) In-Reply-To: References: <086FD9C0ED6A46C297C861B1369617E5@gmail.com> From: Sean Busbey Date: Tue, 3 Feb 2015 12:13:05 -0600 Message-ID: Subject: Re: Adding new field with default value to an Avro schema To: "user@avro apache. org" Content-Type: multipart/alternative; boundary=047d7bdc19ace6eed8050e3306f3 X-Virus-Checked: Checked by ClamAV on apache.org --047d7bdc19ace6eed8050e3306f3 Content-Type: text/plain; charset=UTF-8 On Tue, Feb 3, 2015 at 11:01 AM, Burak Emre wrote: > @Sean thanks for the explanation. > > I have multiple writers but only one reader and the only schema migration > operation is adding a new field so I thought that I may use the same schema > for all dataset since the ordering will be same in all of them even though > some may contain extra fields which is also defined in schema definition. > > Actually I wanted to avoid using an external database for sequential > schema ids since it would make the system more complex than it should be in > my case but it seems this is the only option for now. > > > An external database isn't strictly required. The only important bit is that each schema have a unique immutable identifier. As Doug mentioned, you could do this as an enum of schemas in your source code (so long as you handled updates in reader-then-writer order). Similarly, you could do it by relying on schema fingerprints and just loading avsc files out of shared storage. -- Sean --047d7bdc19ace6eed8050e3306f3 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
On T= ue, Feb 3, 2015 at 11:01 AM, Burak Emre <emrekabakci@gmail.com>= wrote:
@Sean t= hanks for the explanation.

I have multiple writers but only on= e reader and the only schema migration operation is adding a new field so I= thought that I may use the same schema for all dataset since the ordering = will be same in all of them even though some may contain extra fields which= is also defined in schema definition.

Actually I = wanted to avoid using an external database for sequential schema ids since = it would make the system more complex than it should be in my case but it s= eems this is the only option for now.




An external database isn't = strictly required. The only important bit is that each schema have a unique= immutable identifier. As Doug mentioned, you could do this as an enum of s= chemas in your source code (so long as you handled updates in reader-then-w= riter order). Similarly, you could do it by relying on schema fingerprints = and just loading avsc files out of shared storage.

--
Sean
--047d7bdc19ace6eed8050e3306f3--