hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shambhavi Punja <spu...@usc.edu>
Subject Json Parsing in map reduce.
Date Thu, 30 Apr 2015 17:01:08 GMT
Hi,

I am working on an assignment on Hadoop Map reduce. I am very new to Map Reduce.

The assignment has many sections but for now I am trying to parse JSON data.

The input(i.e. value) to the map function is a single record of the form    xyz, {'abc’:’pqr1’,'abc2’:'pq1,
pq2’}, {‘key’:'value1’}
I am interested only in the getting the frequency of value1.

Following is the map- reduce job.

public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text,
IntWritable> {
        	      private final static IntWritable one = new IntWritable(1);
        	      private Text word = new Text();
        
        	      public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable>
output, Reporter reporter) throws IOException {
                      String line = value.toString();
                      String[] tuple = line.split("(?<=\\}),\\s");
                      try{
                      JSONObject obj = new JSONObject(tuple[1]);
                      String id = obj.getString(“key");
                          word.set(id);
                          output.collect(word, one);
                      }
                      catch(JSONException e){
                          e.printStackTrace();
                      }
                  }
            }
        
    
    	    public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable,
Text, IntWritable> {
        	      public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text,
IntWritable> output, Reporter reporter) throws IOException {
            	        int sum = 0;
            	        while (values.hasNext()) {
                	          sum += values.next().get();
                	        }
            	        output.collect(key, new IntWritable(sum));
            	      }
        	    }

I successfully compiled the java code using the json and hadoop jars. Created a jar. But wen
I run the Hadoop command I am getting the following exceptions.


15/04/30 00:36:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
15/04/30 00:36:49 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments.
Applications should implement Tool for the same.
15/04/30 00:36:49 WARN snappy.LoadSnappy: Snappy native library not loaded
15/04/30 00:36:49 INFO mapred.FileInputFormat: Total input paths to process : 1
15/04/30 00:36:49 INFO mapred.JobClient: Running job: job_local1121514690_0001
15/04/30 00:36:49 INFO mapred.LocalJobRunner: Waiting for map tasks
15/04/30 00:36:49 INFO mapred.LocalJobRunner: Starting task: attempt_local1121514690_0001_m_000000_0
15/04/30 00:36:49 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
15/04/30 00:36:49 INFO mapred.MapTask: Processing split: file:/Users/Shamvi/gumgum/jars/input/ab1.txt:0+305
15/04/30 00:36:49 INFO mapred.MapTask: numReduceTasks: 1
15/04/30 00:36:49 INFO mapred.MapTask: io.sort.mb = 100
15/04/30 00:36:49 INFO mapred.MapTask: data buffer = 79691776/99614720
15/04/30 00:36:49 INFO mapred.MapTask: record buffer = 262144/327680
15/04/30 00:36:49 INFO mapred.LocalJobRunner: Map task executor complete.
15/04/30 00:36:49 WARN mapred.LocalJobRunner: job_local1121514690_0001
java.lang.Exception: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.RuntimeException: Error in configuring object
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:483)
	at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
	... 10 more
Caused by: java.lang.NoClassDefFoundError: org/json/JSONException
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:344)
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:881)
	at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:968)
	at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
	... 15 more
Caused by: java.lang.ClassNotFoundException: org.json.JSONException
	at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 22 more
15/04/30 00:36:50 INFO mapred.JobClient:  map 0% reduce 0%
15/04/30 00:36:50 INFO mapred.JobClient: Job complete: job_local1121514690_0001
15/04/30 00:36:50 INFO mapred.JobClient: Counters: 0
15/04/30 00:36:50 INFO mapred.JobClient: Job Failed: NA
Exception in thread "main" java.io.IOException: Job failed!
	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
	at org.myorg.Wordcount.main(Wordcount.java:64)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:483)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:160)


PS: When I modify the same code and exclude the JSON parsing i.e. find frequency of {‘key’:’value1’}
section of the example input, all works well.


Mime
View raw message