Return-Path: X-Original-To: apmail-avro-user-archive@www.apache.org Delivered-To: apmail-avro-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 45242115BD for ; Wed, 14 May 2014 00:04:31 +0000 (UTC) Received: (qmail 90408 invoked by uid 500); 13 May 2014 20:04:31 -0000 Delivered-To: apmail-avro-user-archive@avro.apache.org Received: (qmail 90322 invoked by uid 500); 13 May 2014 20:04:31 -0000 Mailing-List: contact user-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@avro.apache.org Delivered-To: mailing list user@avro.apache.org Received: (qmail 90285 invoked by uid 99); 13 May 2014 20:04:30 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 May 2014 20:04:30 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS X-Spam-Check-By: apache.org Received-SPF: unknown ?allinclude:spf.protection.outlook.com (nike.apache.org: encountered unrecognized mechanism during SPF processing of domain of james@breachintelligence.com) Received: from [207.46.163.143] (HELO na01-bn1-obe.outbound.protection.outlook.com) (207.46.163.143) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 May 2014 20:04:25 +0000 Received: from DM2PR03MB320.namprd03.prod.outlook.com (10.141.54.23) by DM2PR03MB304.namprd03.prod.outlook.com (10.141.54.26) with Microsoft SMTP Server (TLS) id 15.0.939.12; Tue, 13 May 2014 20:03:59 +0000 Received: from DM2PR03MB318.namprd03.prod.outlook.com (10.141.54.17) by DM2PR03MB320.namprd03.prod.outlook.com (10.141.54.23) with Microsoft SMTP Server (TLS) id 15.0.939.12; Tue, 13 May 2014 20:03:56 +0000 Received: from DM2PR03MB318.namprd03.prod.outlook.com ([10.141.54.17]) by DM2PR03MB318.namprd03.prod.outlook.com ([10.141.54.17]) with mapi id 15.00.0939.000; Tue, 13 May 2014 20:03:56 +0000 From: James Campbell To: "user@avro.apache.org" Subject: Reading from disjoint schemas in map Thread-Topic: Reading from disjoint schemas in map Thread-Index: Ac9u5UoOOWZq2Y1IR++9bPTnCjvnSQ== Date: Tue, 13 May 2014 20:03:55 +0000 Message-ID: <9b22ed8d73344dfe9a18bb11b7c6aa29@DM2PR03MB318.namprd03.prod.outlook.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [98.204.58.242] x-forefront-prvs: 0210479ED8 x-forefront-antispam-report: SFV:NSPM;SFS:(10009001)(6009001)(428001)(199002)(189002)(41574002)(76576001)(15202345003)(46102001)(74662001)(2656002)(79102001)(21056001)(83322001)(15975445006)(81342001)(83072002)(16236675002)(33646001)(19580395003)(66066001)(101416001)(92566001)(50986999)(80022001)(77096999)(74316001)(74502001)(54356999)(4396001)(86362001)(77982001)(81542001)(19300405004)(20776003)(87936001)(31966008)(99396002)(76482001)(19625215002)(85852003)(64706001)(24736002);DIR:OUT;SFP:1101;SCL:1;SRVR:DM2PR03MB320;H:DM2PR03MB318.namprd03.prod.outlook.com;FPR:;MLV:sfv;PTR:InfoNoRecords;A:1;MX:1;LANG:; received-spf: None (: breachintelligence.com does not designate permitted sender hosts) authentication-results: spf=none (sender IP is ) smtp.mailfrom=james@breachintelligence.com; Content-Type: multipart/alternative; boundary="_000_9b22ed8d73344dfe9a18bb11b7c6aa29DM2PR03MB318namprd03pro_" MIME-Version: 1.0 X-OriginatorOrg: breachintelligence.com X-Virus-Checked: Checked by ClamAV on apache.org --_000_9b22ed8d73344dfe9a18bb11b7c6aa29DM2PR03MB318namprd03pro_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable I'm trying to read data into a mapreduce job, where the data may have been = created by one of a few different schemas, none of which are evolutions of = one another (though they are related). I have seen several people suggest using a union schema, such that during j= ob setup, one would set the input schema to be the union: ArrayList schemas =3D new ArrayList(); schemas.add(schema1); ... Schema unionSchema =3D Schema.createUnion(schemas); AvroJob.setInputKeySchema(job, unionSchema); However, I don't know how to then extract the correct type inside my mapper= (which was apparently trivial (sorry-I'm new to avro)). I'd guess that the map function profile becomes map(AvroKey = key, NullWritable value, ...) but how can I then cause Avro to read the cor= rectly-typed data from the GenericRecord? Thanks! James --_000_9b22ed8d73344dfe9a18bb11b7c6aa29DM2PR03MB318namprd03pro_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

I’m trying to read data into a mapreduce job, = where the data may have been created by one of a few different schemas, non= e of which are evolutions of one another (though they are related).

 

I have seen several people suggest using a union sch= ema, such that during job setup, one would set the input schema to be the u= nion:

ArrayList<Schema> schemas =3D new ArrayList<= ;Schema>();

schemas.add(schema1);

Schema unionSchema =3D Schema.createUnion(schemas);<= o:p>

AvroJob.setInputKeySchema(job, unionSchema);

 

However, I don’t know how to then extract the = correct type inside my mapper (which was apparently trivial (sorry—I&= #8217;m new to avro)).

 

I’d guess that the map function profile become= s map(AvroKey<GenericRecord> key, NullWritable value, …) but ho= w can I then cause Avro to read the correctly-typed data from the GenericRe= cord?

 

Thanks!

 

James

--_000_9b22ed8d73344dfe9a18bb11b7c6aa29DM2PR03MB318namprd03pro_--