avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Something Something <mailinglist...@gmail.com>
Subject Re: Reading AVRO files in Java MapReduce
Date Thu, 17 Nov 2011 08:27:55 GMT
Okay.... I think I am down to my last problem.  I have a 'AvroMapper' & a
'NonAvroReducer'.  The Reducer is defined as follows:

    private static class NonAvroReducer

            extends MapReduceBase

            implements Reducer<AvroKey<Utf8>, AvroValue<MyClass>, Text,
Text> {


I keep getting this in the Reduce step:

java.lang.ClassCastException: com.xyz.MyAvroProcessor$MyClass cannot be
cast to org.apache.avro.generic.IndexedRecord

at org.apache.avro.generic.GenericData.setField(GenericData.java:463)

at
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)

But 'MyClass' is just a Java Bean that I want to use to pass data between
Mapper & Reducer.  Why do I have to implement IndexedRecord?

Please help.  Thanks.


On Wed, Nov 16, 2011 at 11:20 PM, Something Something <
mailinglists19@gmail.com> wrote:

> Sorry.  Don't worry about this for now.  Made progress.  The input AVRO
> file was bad.  Thanks.
>
>
> On Wed, Nov 16, 2011 at 11:13 PM, Sudharsan Sampath <sudhan65@gmail.com>wrote:
>
>> What's the output logs from your job..
>>
>>
>> On Thu, Nov 17, 2011 at 5:30 AM, Something Something <
>> mailinglists19@gmail.com> wrote:
>>
>>> Why could this not be working?  Any ideas?  I even put a 'throw new
>>> RuntimeException' to see if it's coming to the Mapper, but it isn't.
>>>  Thanks for the help.
>>>
>>> Code snippet:
>>>
>>>
>>>
>>>     public static class MapImpl extends AvroMapper<Utf8, Pair<Utf8,
>>> Long>> {
>>>
>>>         @Override
>>>
>>>         public void map(Utf8 text, AvroCollector<Pair<Utf8, Long>>
>>> collector,
>>>
>>>                         Reporter reporter) throws IOException {
>>>
>>>         throw new RuntimeException("my text: " + text.toString());
>>>
>>> //        System.out.println("my text" + text);
>>>
>>> //
>>>
>>> //            collector.collect(new Pair<Utf8, Long>(text, 1L));
>>>
>>>         }
>>>
>>>     }
>>>
>>>
>>>
>>>     private static class NonAvroReducer
>>>
>>>             extends MapReduceBase
>>>
>>>             implements Reducer<AvroKey<Utf8>, AvroValue<Long>,
Text,
>>> Text> {
>>>
>>>
>>>         public void reduce(AvroKey<Utf8> key, Iterator<AvroValue<Long>>
>>> values,
>>>
>>>                            OutputCollector<Text, Text> out,
>>>
>>>                            Reporter reporter) throws IOException {
>>>
>>>         out.collect(new Text(key.toString()),
>>>
>>>                     new Text("Testing"));
>>>
>>>             while (values.hasNext()) {
>>>
>>>                 AvroValue<Long> value = values.next();
>>>
>>>                 out.collect(new Text(key.toString()),
>>>
>>>                         new Text(value.datum().toString()));
>>>
>>>             }
>>>
>>>         }
>>>
>>>     }
>>>
>>>
>>>  public static void main(String[] args) throws Exception {
>>>
>>>
>>>
>>>         String dir = "/user/mydir";
>>>
>>>
>>>
>>>         JobConf job = new JobConf(new Configuration(),
>>> TestAvroProcessor.class);
>>>
>>>         job.setJobName(TestAvroProcessor.class.getName());
>>>
>>>
>>>
>>>         Path outputPath = new Path(dir + "/out");
>>>
>>>
>>>         outputPath.getFileSystem(job).delete(outputPath);
>>>
>>>
>>>
>>>         AvroJob.setInputSchema(job, Schema.parse(new File(
>>> "/Users/mydir/profiles.json")));
>>>
>>>         AvroJob.setOutputSchema(job, SCHEMA);
>>>
>>>
>>>         AvroJob.setMapperClass(job, MapImpl.class);
>>>
>>>
>>>         FileInputFormat.setInputPaths(job, new Path(dir + "/data"));
>>>
>>>         FileOutputFormat.setOutputPath(job, outputPath);
>>>
>>>         FileOutputFormat.setCompressOutput(job, false);
>>>
>>>
>>>
>>>         job.setReducerClass(NonAvroReducer.class);
>>>
>>>         job.setOutputFormat(TextOutputFormat.class);
>>>
>>>         job.setOutputKeyClass(Text.class);
>>>
>>>         job.setOutputValueClass(Text.class);
>>>
>>>
>>>
>>>         JobClient.runJob(job);
>>>
>>>
>>>
>>>
>>>
>>>     }
>>>
>>>
>>>
>>>
>>> On Tue, Nov 15, 2011 at 5:12 PM, Doug Cutting <cutting@apache.org>wrote:
>>>
>>>> On 11/15/2011 03:16 PM, Something Something wrote:
>>>> > Quick question.  I want the output from AvroJob.setReducerClass to be
>>>> in
>>>> > regular Text files - not in AVRO format.  Can I do that?  Any
>>>> examples?
>>>> >  Sorry, kinda short on time to do research.  Thanks.
>>>>
>>>> On the previously cited documentation page:
>>>>
>>>>
>>>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/package-summary.html
>>>>
>>>> Look for the text, "For jobs whose input is an Avro data file and which
>>>> use an AvroMapper, but whose reducer is a non-Avro Reducer and whose
>>>> output is a non-Avro format".
>>>>
>>>> A sample of a job that does this is at:
>>>>
>>>>  http://s.apache.org/MsG
>>>>
>>>> Just use TextOutputFormat instead of SequenceFileOutputFormat.
>>>>
>>>> Doug
>>>>
>>>
>>>
>>
>

Mime
View raw message