Return-Path: X-Original-To: apmail-avro-user-archive@www.apache.org Delivered-To: apmail-avro-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 61B1F18BCB for ; Thu, 25 Jun 2015 21:34:49 +0000 (UTC) Received: (qmail 76468 invoked by uid 500); 25 Jun 2015 21:34:49 -0000 Delivered-To: apmail-avro-user-archive@avro.apache.org Received: (qmail 76392 invoked by uid 500); 25 Jun 2015 21:34:49 -0000 Mailing-List: contact user-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@avro.apache.org Delivered-To: mailing list user@avro.apache.org Received: (qmail 76382 invoked by uid 99); 25 Jun 2015 21:34:49 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Jun 2015 21:34:49 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 205541A6146 for ; Thu, 25 Jun 2015 21:34:43 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.15 X-Spam-Level: *** X-Spam-Status: No, score=3.15 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id dTxhKB6u5icW for ; Thu, 25 Jun 2015 21:34:32 +0000 (UTC) Received: from mail-ig0-f171.google.com (mail-ig0-f171.google.com [209.85.213.171]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 3319322F21 for ; Thu, 25 Jun 2015 21:34:32 +0000 (UTC) Received: by igblr2 with SMTP id lr2so1314838igb.0 for ; Thu, 25 Jun 2015 14:33:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=pRK94XYuIEpVsofyAYtZn3hVFKt8g8pwUwATnay7JCg=; b=ndhbLaPFBZihUnY6nVLSHgYUc3gmNONmCPWGtJtOzF5iGRA4mfxWGPCvMnSNxq+rcf 57ZDEIihxvCaUXV+0Y2HtLlySUjdRJoQ8xuE4ewWpiAOr56elMOWnA4G1JAR1cbHf5HD lBlfKCk/s1hR7vXwbMtDAjMeK+w+6lFpMc68NFEe6FKKDMRN4ahyR5fvA19g48rLbFmT DS6hXouBLpMymenSXPxjQOHk8+iMT5qcKekSfJbiZNQwGnKLXYMKYfm6SwIWOl+ZU8ND LtN5s7SOjehfZfp6xX9QqeX71Poq59t7O7Mb2FDG4sY2dvyEfBhNB1ewoVD/thi5dVyn +2qA== MIME-Version: 1.0 X-Received: by 10.42.206.9 with SMTP id fs9mr44784133icb.19.1435268026675; Thu, 25 Jun 2015 14:33:46 -0700 (PDT) Received: by 10.107.180.72 with HTTP; Thu, 25 Jun 2015 14:33:46 -0700 (PDT) In-Reply-To: <272721792.1379421.1435265581538.JavaMail.yahoo@mail.yahoo.com> References: <272721792.1379421.1435265581538.JavaMail.yahoo@mail.yahoo.com> Date: Thu, 25 Jun 2015 15:33:46 -0600 Message-ID: Subject: Re: Writing to multiple AvroSchemas in MapReduce From: Nishanth S To: user@avro.apache.org Content-Type: multipart/alternative; boundary=20cf303f64d0e3396805195e6015 --20cf303f64d0e3396805195e6015 Content-Type: text/plain; charset=UTF-8 Thank you Sam.I am trying to read only one binary file in map reduce and split that into 4 avro files each having different schema.I am trying to do this in one job but I am still not sure how to specify multipleoutput schemas to an Avrojob instance.Do we need to create multiple instances of Avrojob in the map reduce driver to do this?. Thanks, Nishan On Thu, Jun 25, 2015 at 2:53 PM, Sam Groth wrote: > If you process 4 files with schemas A, B, C, and D as the writer schemas, > then I would assume that you would want to specify the reader schema using > the setInput*Schema methods. Then you can set the writer schema with the > methods that you are calling. To be clear all data processed by the job > should have one reader schema that is determined when the data is read, and > there should also be one writer schema (possibly different from the reader > schema) when the data is written back to files. If you need to process the > data from each schema independently, you should probably create one job for > each schema. > > Disclaimer: I have never used the AvroJob interface directly; so this is > just me inferring what I think it should do based on my experience with > AvroStorage and the other language specific Avro interfaces. > > Hope this helps, > Sam > > > > On Thursday, June 25, 2015 12:53 PM, Nishanth S > wrote: > > > > Hello All, > > We are using avro 1.7.7 and hadoop 2.5.1 in our project.We need to > process a mixed mode binary file using map reduce and have the output as > multiple avro files and each of these avro files would have different avro > schemas.I looked at AvroMultipleOutputs class but did not completely > understand on what needs to be done in the driver class.This is a map only > job the output of which should be 4 different avro files(which has > different avro schemas) into different hdfs directories. > > Do we need to set all key and value avro schemas to Avrojob in driver > class? > > AvroJob.setOutputKeySchema(job, Schema.create(Schema.Type.NULL)); > AvroJob.setOutputValueSchema(job, A.getClassSchema()); > > > > Now if I have schemas B,C and D how would these be set to > AvroJob?.Thanks for your help. > > > Thanks, > Nishan > > > > > --20cf303f64d0e3396805195e6015 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Thank you Sam.I =C2=A0am trying to read only one binary fi= le in map reduce and split that into 4 avro files each having different sch= ema.I am trying to do =C2=A0this in one job but I am still not sure how to = specify multipleoutput schemas to an Avrojob instance.Do we need to create = multiple instances of Avrojob in the map reduce driver to do this?.
Thanks,
Nishan
=
On Thu, Jun 25, 2015 at 2:53 PM, Sam Groth <= span dir=3D"ltr"><sgroth@yahoo-inc.com> wrote:
If you pro= cess 4 files with schemas A, B, C, and D as the writer schemas, then I woul= d assume that you would want to specify the reader schema using the setInpu= t*Schema methods. Then you can set the writer schema with the methods that = you are calling. To be clear all data processed by the job should have one = reader schema that is determined when the data is read, and there should al= so be one writer schema (possibly different from the reader schema) when th= e data is written back to files. If you need to process the data from each = schema independently, you should probably create one job for each schema.

Disclaimer: I have neve= r used the AvroJob interface directly; so this is just me inferring what I = think it should do based on my experience with AvroStorage and the other la= nguage specific Avro interfaces.

Hope this helps,
Sam



On Thursday, June 25, 2015 12:53 PM,= Nishanth S <= chinchu2884@gmail.com> wrote:



Hello All,

We are using avro 1.7.7 =C2=A0and hadoop = 2.5.1 in our project.We need to process a mixed mode binary file using map = reduce and have the output as multiple avro files and each of these avro fi= les would have different avro schemas.I looked at AvroMultipleOutputs class= but did not completely understand =C2=A0on what needs to be done in the dr= iver class.This is a map only job the output of which should be =C2=A04 dif= ferent avro files(which has different avro schemas) into different hdfs dir= ectories.

Do we need to set all = key and value avro schemas to Avrojob in driver class?

AvroJob.setOutputKeySchema(job, Schema.create(S= chema.Type.NULL));
Av= roJob.setOutputValueSchema(job, A.getClassSchema());



Now if =C2=A0I have schemas B,C and D =C2=A0how would =C2= =A0these be set to AvroJob?.Thanks for =C2=A0your help.


Thanks,
= Nishan


--20cf303f64d0e3396805195e6015--