Return-Path: X-Original-To: apmail-avro-user-archive@www.apache.org Delivered-To: apmail-avro-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0741F10CF1 for ; Wed, 23 Oct 2013 19:00:42 +0000 (UTC) Received: (qmail 55498 invoked by uid 500); 23 Oct 2013 19:00:18 -0000 Delivered-To: apmail-avro-user-archive@avro.apache.org Received: (qmail 54543 invoked by uid 500); 23 Oct 2013 19:00:16 -0000 Mailing-List: contact user-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@avro.apache.org Delivered-To: mailing list user@avro.apache.org Received: (qmail 54278 invoked by uid 99); 23 Oct 2013 19:00:05 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Oct 2013 19:00:05 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dsuiter@rdx.com designates 74.125.82.52 as permitted sender) Received: from [74.125.82.52] (HELO mail-wg0-f52.google.com) (74.125.82.52) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Oct 2013 18:59:59 +0000 Received: by mail-wg0-f52.google.com with SMTP id f12so1277793wgh.31 for ; Wed, 23 Oct 2013 11:59:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rdx.com; s=google; h=mime-version:date:message-id:subject:from:to:content-type; bh=fU+MMb5igb2PQ+RUIKa6KudcUzWJ6AsM2u4FpaWR4bc=; b=PaapZAKCJdvvV/dHTdQIvQSQIObsb0KDBFBEMG3qGOgJbi6VI6nRq5tb0yrBtO/E9a pVXDmBONXMGiE3CXI6cKxjeBo7T/e+NUx2+PKTTPrgxob+L1blK0x665Uqhe0+Xrv6iS H66/Me5StxA+pYzhm07ooHvBaw6L87EIuxp5k= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=fU+MMb5igb2PQ+RUIKa6KudcUzWJ6AsM2u4FpaWR4bc=; b=IuGR0i9hGyULbT2Lz4Lxk5ZC0nfaghjbWBOjTRwXIm1Tceola35Opghb1ZU3o5A0Th fKCfFChBVRCcoiIdQk8lgVPRobWnclrtNHuNOY8GKZIlCntBzexZ4HZJ20dIWrhNzlr3 A6DzRTxC5se+dJB2yq9CbGCjAzm9d6idG1JNiU6bUExWG9dl17rOWz/kUTImxNiio0Q8 sVuBDTOXFLZk6hGYTKeFruosoMRj8enz9h7gYMmU3VIhtRtbDkEUD885kgbgp4qzwfee +zIHBU7k41XCnz7ZDVujr2dPlysDE0fKAbG4Slr+YLY1SD9q46KWEWMJKIfYpzPQFfcU y4hw== X-Gm-Message-State: ALoCoQl677kylEIle/lqATN9kcdmnxuTNji+eD1oAOJtqNyTchGUGdlacIXQE322GM40KZn643NA MIME-Version: 1.0 X-Received: by 10.180.36.80 with SMTP id o16mr2828458wij.1.1382554778297; Wed, 23 Oct 2013 11:59:38 -0700 (PDT) Received: by 10.216.52.134 with HTTP; Wed, 23 Oct 2013 11:59:38 -0700 (PDT) Date: Wed, 23 Oct 2013 14:59:38 -0400 Message-ID: Subject: Avro Input > Text output MapReduce new Avro API From: DSuiter RDX To: user@avro.apache.org Content-Type: multipart/alternative; boundary=e89a8f50350e71916a04e96d1ea5 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f50350e71916a04e96d1ea5 Content-Type: text/plain; charset=ISO-8859-1 Hello, We are trying to take an Avro input, send it through mapreduce, and get a text output. We are trying to stay consistent with the org.apache.avro.*mapreduce *API, and are having a lot of trouble understanding how to extract the wrapped Avro input and map it out as text. Here is our driver and mapper, and a log of the error. Would anyone be able to provide some direction? I have seen the examples from the Avro project pages, and the popularly referenced Git examples, but they all are written using org.apache.avro.* mapred* API. There is one example from Chris White on Gist for Avro *output* but nothing for input. It seems like even that one uses the org.apache.avro.mapred.AvroKey, which you will see our code uses as well. I have seen mention of using the org.apache.avro.mapreduce.* input objects and output objects as wrappers for mapreduce primitives, but cannot find examples anywhere, and there are no examples or instructions on the apache pages as there are for org.apache.avro.mapred. Driver: import org.apache.avro.mapreduce.AvroJob; import org.apache.avro.mapreduce.AvroKeyInputFormat; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class RTTicketCount extends Configured implements Tool { @Override public int run(String[] args) throws Exception { if (args.length != 2) { System.err .printf("Usage: RTTicketCount \n"); return -1; } // String input = args[0]; // String output = args[1]; Job job = new Job(getConf()); job.setJobName("RTTicketCount"); job.setJarByClass(RTTicketCount.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapperClass(TicketMapper.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setCombinerClass(SumReducer.class); job.setReducerClass(SumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); AvroJob.setInputKeySchema(job, Event.getClassSchema()); job.setInputFormatClass(AvroKeyInputFormat.class); boolean success = job.waitForCompletion(true); return success ? 0 : 1; } public static void main(String[] args) throws Exception { int exitCode = ToolRunner.run(new Configuration(), new RTTicketCount(), args); System.exit(exitCode); } } Mapper: import java.io.IOException; import org.apache.avro.mapred.AvroKey; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class TicketMapper extends Mapper, NullWritable, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); protected void map(AvroKey key, Iterable values, Context context) throws IOException, InterruptedException { String check = key.datum().getBody().toString(); if (check.contains("Ticket")) { context.write(new Text(check), one); } } Log output of exception: 13/10/23 13:57:11 INFO mapred.JobClient: Task Id : attempt_201310160855_0007_m_000000_0, Status : FAILED java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.avro.mapred.AvroKey at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:890) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:601) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:106) at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:120) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred Thank You, *Devin Suiter* Jr. Data Solutions Software Engineer 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212 Google Voice: 412-256-8556 | www.rdx.com --e89a8f50350e71916a04e96d1ea5 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hello,

We are trying to take= an Avro input, send it through mapreduce, and get a text output. We are tr= ying to stay consistent with the org.apache.avro.mapreduce API, and = are having a lot of trouble understanding how to extract the wrapped Avro i= nput and map it out as text. Here is our driver and mapper, and a log of th= e error. Would anyone be able to provide some direction? I have seen the ex= amples from the Avro project pages, and the popularly referenced Git exampl= es, but they all are written using org.apache.avro.mapred=A0API. The= re is one example from Chris White on Gist for Avro output=A0but not= hing for input. It seems like even that one uses the org.apache.avro.mapred= .AvroKey, which you will see our code uses as well. I have seen mention of = using the org.apache.avro.mapreduce.* input objects and output objects as w= rappers for mapreduce primitives, but cannot find examples anywhere, and th= ere are no examples or instructions on the apache pages as there are for or= g.apache.avro.mapred.

Driver:
import org.apache.avro.mapreduce= .AvroJob;
import org.apache.avro.mapreduce.AvroKeyInputFormat;
import org.apache.hadoop.conf.Configuration;
import org.a= pache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.i= o.IntWritable;
import org.apache.hadoop.io.Text;
import= org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapred= uce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
<= div>import org.apache.hadoop.util.Tool;
import org.apache.hadoop.= util.ToolRunner;

public class RTTicketCount extend= s Configured implements Tool {
@Override
public int run(String[]= args) throws Exception {
if (args.length !=3D 2) {
=A0 =A0 =A0 System.err
.printf("Usage: RTTicketCount <input dir> <= output dir>\n");
return -1;
}

=A0 =A0 // String input =3D args[0];
=A0 =A0 // String o= utput =3D args[1];

Job job =3D new Job(getConf());
job.setJobName(&qu= ot;RTTicketCount");
job.setJarByClass(RTTicketCount.class);

=A0 =A0 FileInputFormat.setInputPaths(job, new Path(args[0]));
=A0 =A0 FileOutputFormat.setOutputPath(job, new Path(args[1]));
<= div>
job.= setMapperClass(TicketMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputVa= lueClass(IntWritable.class);
=A0 =A0 job.setCombinerClass(SumRedu= cer.class);
j= ob.setReducerClass(SumReducer.class);

job= .setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
AvroJob.setInputKe= ySchema(job, Event.getClassSchema());
job.setInputFormatClass(AvroKeyInputFormat.class);=

boo= lean success =3D job.waitForCompletion(true);
return success ? 0 : 1;
}
public static void main(String[] args) throws Exception {
int exitCode = =3D ToolRunner.run(new Configuration(), new RTTicketCount(),
args);
System.exit(exitCo= de);
}
}

Mapper:
import java.io.I= OException;
import org.apache.avro.mapred.AvroKey;
import org.apache.had= oop.io.IntWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop= .mapreduce.Mapper;

public class TicketMapper extends
Mapper<AvroKey<Event>, Nu= llWritable, Text, IntWritable> {
private final stat= ic IntWritable one =3D new IntWritable(1);
=A0 prot= ected void map(AvroKey<Event> key, Iterable<NullWritable> value= s,
=A0 =A0 =A0 Context context) throws IOException, InterruptedEx= ception {
=A0 =A0 String check =3D key.datum().getBody().toString();
= =A0 =A0 if (check.contains("Ticket")) {
=A0 =A0 =A0 con= text.write(new Text(check), one);
=A0 =A0 }
=A0 }
=

Log output of exception:

13/10/23 13:57:= 11 INFO mapred.JobClient: Task Id : attempt_201310160855_0007_m_000000_0, S= tatus : FAILED
java.io.IOException: Type mismatch in key from map= : expected org.apache.hadoop.io.Text, recieved org.apache.avro.mapred.AvroK= ey
at org.apache.hadoo= p.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:890)
at org.apache.hadoop.mapred.Ma= pTask$NewOutputCollector.write(MapTask.java:601)
at org.apache.hadoo= p.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImp= l.java:85)
at = org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapp= er.java:106)
at org.apache.hadoo= p.mapreduce.Mapper.map(Mapper.java:120)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.= java:140)
at org.apache.hadoo= p.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(M= apTask.java:330)
at org.apache.hadoo= p.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Nativ= e Method)
at javax.security.a= uth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doA= s(UserGroupInformation.java:1408)
at org.apache.hadoo= p.mapred


Thank You,
Devin Suiter
Jr. Data Solutions So= ftware Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 |=A0www.rdx.com
--e89a8f50350e71916a04e96d1ea5--