Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3802C106DD for ; Thu, 6 Mar 2014 08:09:38 +0000 (UTC) Received: (qmail 19769 invoked by uid 500); 6 Mar 2014 08:09:30 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 18919 invoked by uid 500); 6 Mar 2014 08:09:28 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 18907 invoked by uid 99); 6 Mar 2014 08:09:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Mar 2014 08:09:27 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of raofengyun@gmail.com designates 209.85.128.174 as permitted sender) Received: from [209.85.128.174] (HELO mail-ve0-f174.google.com) (209.85.128.174) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Mar 2014 08:09:21 +0000 Received: by mail-ve0-f174.google.com with SMTP id oz11so2219830veb.19 for ; Thu, 06 Mar 2014 00:09:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=idgGq35GlTTeGMT+k5Z8fw6Ic/6HDFyfiG9BMcvvtkY=; b=a6Q2pSTB6vylRQ/wKmH1cbb12YULn3O+FfJ3VxHHfOT/IXf7A397fc1zVYJ/lFakwr Ld2cCOmaJfGx7MGq0LPywCWRpdF6hN7MkT5zAyXXEaEW/ita84rfbcLlTycHg2kOw242 W+JAbKB7NiJ6EcMccHgqrjJHFxt/uqSN6lfe3oxjmxxOjTDRyZFM+nl3gRIhW+J3yp0d kTLy+JEmcupubjntETw0Lme+eAqSsUNY0XToLgRzvnlDgRW3+9E/A4McmmvPMmAWKD0v CiMO1qVD7LmRrVMFDPJyIUdNHhsG15rOmzVIa1c0n/Rj/nwRbXUeJH7cCcC7cn1k4a3v 0WwA== MIME-Version: 1.0 X-Received: by 10.221.34.211 with SMTP id st19mr3912420vcb.5.1394093340798; Thu, 06 Mar 2014 00:09:00 -0800 (PST) Received: by 10.220.232.68 with HTTP; Thu, 6 Mar 2014 00:09:00 -0800 (PST) Date: Thu, 6 Mar 2014 16:09:00 +0800 Message-ID: Subject: MapReduce: How to output multiplt Avro files? From: Fengyun RAO To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a1136516c5cb2da04f3eba698 X-Virus-Checked: Checked by ClamAV on apache.org --001a1136516c5cb2da04f3eba698 Content-Type: text/plain; charset=ISO-8859-1 our input is a line of text which may be parsed to e.g. A or B object. We want all A objects written to "A.avro" files, while all B objects written to "B.avro". I looked into AvroMultipleOutputs class: http://avro.apache.org/docs/1.7.4/api/java/org/apache/avro/mapreduce/AvroMultipleOutputs.html There is an example, however, it's not quite clear. For job submission, it uses AvroMultipleOutputs.addNamedOutput to add schemas for A and B. In my program looks like: AvroMultipleOutputs.addNamedOutput(job, "A", AvroKeyOutputFormat.class, aSchema, null); AvroMultipleOutputs.addNamedOutput(job, "B", AvroKeyOutputFormat.class, bSchema, null); I believe this is for Reducer output files. *My question is* what the Mapper output should be, in specific what "job.setMapOutputValueClass" should be, since the Mapper output could be A or B object, with schema aSchema or bSchema. In my progam, I simply set it to GenericData, but get error as below: 14/03/06 15:55:34 INFO mapreduce.Job: Task Id : attempt_1393817780522_0012_m_000010_2, Status : FAILED Error: java.lang.NullPointerException at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:989) at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:390) at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:79) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:674) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:746) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160) I have no idea what this means. --001a1136516c5cb2da04f3eba698 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
our input is a line of text which may be parsed to e.g. A = or B object.
We want all A objects written to "A.avro" files,= while all B objects written to "B.avro".

I looked into=A0AvroMultipleOutputs class:=A0http://avro.apache.org/docs/1.7.4/api/java/org/apache/avro/mapreduce/AvroM= ultipleOutputs.html
There is an example, however, it's not quite clear.
For = job submission, it uses AvroMultipleOutputs.addNamedOutput to add schemas f= or A and B.
In my program looks like:
=A0 =A0 =A0 = =A0 AvroMultipleOutputs.addNamedOutput(job, "A", AvroKeyOutputFor= mat.class, aSchema, null); =A0
=A0 =A0 =A0 =A0 AvroMultipleOutputs.addNamedOutput(job, "B",= AvroKeyOutputFormat.class, bSchema, null);
I believe this = is for Reducer output files.

My question is= what the Mapper output should be, in specific what "job.setMapOutputV= alueClass" should be,=A0
since the Mapper output could be A or B object, with schema aSchema or= bSchema.

In my progam, I simply set it to Generic= Data, but get error as below:

14/03/06 15:55:= 34 INFO mapreduce.Job: Task Id : attempt_1393817780522_0012_m_000010_2, Sta= tus : FAILED
Error: java.lang.NullPointerException
=A0 =A0 =A0 =A0 at org= .apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:989)
<= div>=A0 =A0 =A0 =A0 at org.apache.hadoop.mapred.MapTask.createSortingCollec= tor(MapTask.java:390)
=A0 =A0 =A0 =A0 at org.apache.hadoop.mapred.MapTask.access$100(MapTask= .java:79)
=A0 =A0 =A0 =A0 at org.apache.hadoop.mapred.MapTask$New= OutputCollector.<init>(MapTask.java:674)
=A0 =A0 =A0 =A0 at= org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:746)
=A0 =A0 =A0 =A0 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:3= 39)
=A0 =A0 =A0 =A0 at org.apache.hadoop.mapred.YarnChild$2.run(Y= arnChild.java:165)
=A0 =A0 =A0 =A0 at java.security.AccessControl= ler.doPrivileged(Native Method)
=A0 =A0 =A0 =A0 at javax.security.auth.Subject.doAs(Subject.java:415)<= /div>
=A0 =A0 =A0 =A0 at org.apache.hadoop.security.UserGroupInformatio= n.doAs(UserGroupInformation.java:1491)
=A0 =A0 =A0 =A0 at org.apa= che.hadoop.mapred.YarnChild.main(YarnChild.java:160)

I have no idea what this means.
--001a1136516c5cb2da04f3eba698--