hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Emitting Java Collection as mapper output
Date Tue, 10 Jul 2012 11:35:50 GMT
Short answer: Yes.

With Writable serialization, there's *some* support for collection
structures in the form of MapWritable and ArrayWritable. You can make
use of these classes.

However, I suggest using Apache Avro for these things, its much better
to use its schema/reflect oriented serialization than using Writables.
See http://avro.apache.org

On Tue, Jul 10, 2012 at 4:45 PM, Mohammad Tariq <dontariq@gmail.com> wrote:
> Hello list,
>
>       Is it possible to emit Java collections from a mapper??
>
> My code looks like this -
> public class UKOOAMapper extends Mapper<LongWritable, Text,
> LongWritable, List<Text>> {
>
>         public static Text CDPX = new Text();
>         public static Text CDPY = new Text();
>         public static List<Text> vals = new ArrayList<Text>();
>         public static LongWritable count = new LongWritable(1);
>
>         public void map(LongWritable key, Text value, Context context)
>                         throws IOException, InterruptedException {
>                 String line = value.toString();
>                 if (line.startsWith("Q")) {
>                         CDPX.set(line.substring(2, 13).trim());
>                         CDPY.set(line.substring(20, 25).trim());
>                         vals.add(CDPX);
>                         vals.add(CDPY);
>                         context.write(count, vals);
>                 }
>         }
> }
>
> And the driver class is -
> public static void main(String[] args) throws IOException,
> InterruptedException, ClassNotFoundException {
>
>                 Path filePath = new Path("/ukooa/UKOOAP190.0026_FAZENDA_JUERANA_1.ukooa");
>                 Configuration conf = new Configuration();
>                 Job job = new Job(conf, "SupportFileValidation");
>                 conf.set("mapreduce.output.key.field.separator", "              ");
>                 job.setMapOutputValueClass(List.class);
>                 job.setOutputKeyClass(LongWritable.class);
>                 job.setOutputValueClass(Text.class);
>                 job.setMapperClass(UKOOAMapper.class);
>                 job.setReducerClass(ValidationReducer.class);
>                 job.setInputFormatClass(TextInputFormat.class);
>                 job.setOutputFormatClass(TextOutputFormat.class);
>                 FileInputFormat.addInputPath(job, filePath);
>                 FileOutputFormat.setOutputPath(job, new Path("/mapout/"+filePath));
>                 job.waitForCompletion(true);
>         }
>
> When I am trying to execute the program, I am getting the following error -
> 12/07/10 16:41:46 WARN util.NativeCodeLoader: Unable to load
> native-hadoop library for your platform... using builtin-java classes
> where applicable
> 12/07/10 16:41:46 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the
> same.
> 12/07/10 16:41:46 INFO input.FileInputFormat: Total input paths to process : 1
> 12/07/10 16:41:46 INFO mapred.JobClient: Running job: job_local_0001
> 12/07/10 16:41:46 INFO util.ProcessTree: setsid exited with exit code 0
> 12/07/10 16:41:46 INFO mapred.Task:  Using ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@456dfa45
> 12/07/10 16:41:46 INFO mapred.MapTask: io.sort.mb = 100
> 12/07/10 16:41:46 INFO mapred.MapTask: data buffer = 79691776/99614720
> 12/07/10 16:41:46 INFO mapred.MapTask: record buffer = 262144/327680
> 12/07/10 16:41:46 WARN mapred.LocalJobRunner: job_local_0001
> java.lang.NullPointerException
>         at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
>         at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:965)
>         at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:674)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>         at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 12/07/10 16:41:47 INFO mapred.JobClient:  map 0% reduce 0%
> 12/07/10 16:41:47 INFO mapred.JobClient: Job complete: job_local_0001
> 12/07/10 16:41:47 INFO mapred.JobClient: Counters: 0
>
> Need some guidance from the experts. Please let me know where I am
> going wrong. Many thanks.
>
> Regards,
>     Mohammad Tariq



-- 
Harsh J

Mime
View raw message