hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis Kreis <de.kr...@gmail.com>
Subject Re: Issue with DistributedCache
Date Thu, 24 Nov 2011 13:05:01 GMT
Hi Bejoy

1. Old API:
The Map and Reduce classes are the same as in the example, the main
method is as follows

public static void main(String[] args) throws IOException,
InterruptedException {
		UserGroupInformation ugi =
UserGroupInformation.createProxyUser("<remote user name>",
UserGroupInformation.getLoginUser());
		ugi.doAs(new PrivilegedExceptionAction<Void>() {
            public Void run() throws Exception {
            	
            	JobConf conf = new JobConf(WordCount.class);
        		conf.setJobName("wordcount");
        		
        		conf.setOutputKeyClass(Text.class);
        		conf.setOutputValueClass(IntWritable.class);
        		
        		conf.setMapperClass(Map.class);
        		conf.setCombinerClass(Reduce.class);
        		conf.setReducerClass(Reduce.class);
        	
        		conf.setInputFormat(TextInputFormat.class);
        		conf.setOutputFormat(TextOutputFormat.class);

        		FileInputFormat.setInputPaths(conf, new Path("<path to input dir>"));
        		FileOutputFormat.setOutputPath(conf, new Path("<path to
output dir>"));
        		
        		conf.set("mapred.job.tracker", "<ip:8021>");
        		
        		FileSystem fs = FileSystem.get(new URI("hdfs://<ip>:8020"),
new Configuration());
        		fs.mkdirs(new Path("<remote path>"));
        		fs.copyFromLocalFile(new Path("<local path>/test.jar"), new
Path("<remote path>"));
        		
        		DistributedCache.addArchiveToClassPath(new Path("<remote
path>/test.jar"), conf, fs);
        		
        		JobClient.runJob(conf);
            	
            	return null;
            }
          });
	}
It works fine

2. New API:

public class WordCountNewAPI {
	
	public static class WordCountMapper extends Mapper<LongWritable,
Text, Text, IntWritable> {
		
		private final static IntWritable ONE = new IntWritable(1);
		private Text word = new Text();

		@Override
		protected void map(LongWritable key, Text value, Context context)
				throws IOException, InterruptedException {
			
			String line = value.toString();
			StringTokenizer tokenizer = new StringTokenizer(line);
			while (tokenizer.hasMoreTokens()) {
				word.set(tokenizer.nextToken());
				context.write(word, ONE);
			}
			
			super.map(key, value, context);
		}
		
	}
	
	public static class WordCountReducer extends Reducer<Text,
IntWritable, Text, IntWritable> {

		@Override
		protected void reduce(Text key, Iterable<IntWritable> values, Context context)
				throws IOException, InterruptedException {
			
			int sum = 0;
			Iterator<IntWritable> iter = values.iterator();
			while (iter.hasNext()) {
				sum += iter.next().get();
			}
			
			context.write(key, new IntWritable(sum));
			
			super.reduce(key, values, context);
		}
		
	}
	
	/**
	 * @param args
	 * @throws IOException
	 * @throws InterruptedException
	 */
	public static void main(String[] args) throws IOException,
InterruptedException {
		UserGroupInformation ugi =
UserGroupInformation.createProxyUser("<remote user name>",
UserGroupInformation.getLoginUser());
		ugi.doAs(new PrivilegedExceptionAction<Void>() {
            public Void run() throws Exception {
            	
            	Configuration conf = new Configuration();
            	conf.set("mapred.job.tracker", "<ip:8021>");
            	
            	Job job = new Job(conf, "wordcount");
            	
            	job.setJarByClass(WordCountNewAPI.class);
        		
            	job.setOutputKeyClass(Text.class);
            	job.setOutputValueClass(IntWritable.class);
            	
            	job.setMapperClass(WordCountMapper.class);
            	job.setCombinerClass(WordCountReducer.class);
            	job.setReducerClass(WordCountReducer.class);
            	
            	job.setInputFormatClass(TextInputFormat.class);
            	job.setOutputFormatClass(TextOutputFormat.class);
            	
            	FileInputFormat.setInputPaths(job, new Path("<path to
input dir>"));
            	FileOutputFormat.setOutputPath(job, new Path("<path to
output dir>"));
        		
        		FileSystem fs = FileSystem.get(new URI("hdfs://<ip>:8020"),
new Configuration());
        		fs.mkdirs(new Path("<remote path>"));
        		fs.copyFromLocalFile(new Path("<local path>/test.jar"), new
Path("<remote path>"));
        		
        		DistributedCache.addArchiveToClassPath(new Path("<remote
path>/test.jar"), conf, fs);
        		
        		boolean b = job.waitForCompletion(true);
    			if (!b) {
    				throw new IOException("error with job!");
    			}
            	
            	return null;
            }
          });
	}

}

2011/11/24 Bejoy Ks <bejoy.hadoop@gmail.com>:
> Hi Denis
>       Unfortunately the mailing lists strips off attachments, So it'd be
> great if you could paste the source in some location and share the url of
> the same. If the source is small enough then please include the same in
> subject body.
>
> For a quick comparison,  Try comparing your code with the following sample.
> I just scribbled it long back and it was working
> http://kickstarthadoop.blogspot.com/2011/05/word-count-example-with-hadoop-020.html
>
> Hope it helps!..
>
> Regards
> Bejoy.K.S
>
> On Thu, Nov 24, 2011 at 4:20 PM, Denis Kreis <de.kreis@gmail.com> wrote:
>
>> Hi
>>
>> I' trying to modify the word count example
>> (http://wiki.apache.org/hadoop/WordCount) using the new api
>> (org.apache.hadoop.mapreduce.*). I run the job on a remote
>> pseudo-distributed cluster. It works fine with the old api, but when I
>> using the new one, i'm getting this:
>>
>>
>> 11/11/24 11:28:02 INFO mapred.JobClient: Task Id :
>> attempt_201111241046_0005_m_000000_0, Status : FAILED
>> java.lang.RuntimeException: java.lang.ClassNotFoundException:
>> WordCountNewAPI$WordCountMapper
>>        at
>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:866)
>>        at
>> org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:199)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:719)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>>        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>>        at java.security.AccessController.doPrivileged(Native Method)
>>        at javax.security.auth.Subject.doAs(Subject.java:396)
>>        at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:249)
>> Caused by: java.lang.ClassNotFoundException:
>> WordCountNewAPI$WordCountMapper
>>        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>        at java.security.AccessController.doPrivileged(Native Method)
>>        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>        at java.lang.Class.forName0(Native Method)
>>        at java.lang.Class.forName(Class.java:247)
>>        at
>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:819)
>>        at
>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:864)
>>        ... 8 more
>>
>> The sources are in the attachment
>>
>> Regards
>> Denis
>>
>

Mime
View raw message