Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 29302 invoked from network); 16 Oct 2010 16:08:42 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 16 Oct 2010 16:08:42 -0000 Received: (qmail 49566 invoked by uid 500); 16 Oct 2010 16:08:39 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 49526 invoked by uid 500); 16 Oct 2010 16:08:39 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 49518 invoked by uid 99); 16 Oct 2010 16:08:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 16 Oct 2010 16:08:39 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of qwertymaniac@gmail.com designates 209.85.161.48 as permitted sender) Received: from [209.85.161.48] (HELO mail-fx0-f48.google.com) (209.85.161.48) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 16 Oct 2010 16:08:34 +0000 Received: by fxm13 with SMTP id 13so1553551fxm.35 for ; Sat, 16 Oct 2010 09:08:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:received :in-reply-to:references:date:message-id:subject:from:to:content-type; bh=6JvqL4miibVXMTNBn6N+eq7Kj+JfYzaSPL52SbqeAes=; b=TuGNmGuIpnEqAW2cc/xx99+bzHT2xsrS1vXcBJllxEgtbmpOwJZsiia7vIWsLsCUgT +ipOYlLoV2syH4oAA1+BQ8LbVYx/JPFAuwgr03hDczCBvLI7gArr8QyZsspXWIeHFxS3 AGvLJAroSH0gx8X7QhT5P+HQ/SttrkC/KNVYY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=psvBI+Z8oNVla2MPO9qD40EDnH1bMZ1yQ8eogOIPNW0GK+TCeL8etMaOPlEUazF2Ab KRF/Qes7r+itJIVuFEOfzhGrn9a/BtqfY2sf/2mBd7ulUwR13HwY1QCnAq808taryWDy hMkQNjUyBQjvsHRGrwzEhUfpRADrNCORJCwFg= MIME-Version: 1.0 Received: by 10.103.138.14 with SMTP id q14mr1149402mun.0.1287245292645; Sat, 16 Oct 2010 09:08:12 -0700 (PDT) Received: by 10.223.112.66 with HTTP; Sat, 16 Oct 2010 09:08:12 -0700 (PDT) Received: by 10.223.112.66 with HTTP; Sat, 16 Oct 2010 09:08:12 -0700 (PDT) In-Reply-To: <626775452.376419.1287243821573.JavaMail.root@ksu-mailstore03.merit.edu> References: <626775452.376419.1287243821573.JavaMail.root@ksu-mailstore03.merit.edu> Date: Sat, 16 Oct 2010 21:38:12 +0530 Message-ID: Subject: Re: pls clarify on this From: Harsh J To: common-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0016e65861ac682c580492be2662 --0016e65861ac682c580492be2662 Content-Type: text/plain; charset=ISO-8859-1 You can emit whichever Writable you like, but as per your given code your Reducer class (the class definition line specifically) is looking for an IntWritable in the value's iterable. Change that to Text and it should do what you expect :) For reference, look into JobConf.setMapOutputKeyClass(...) and JobConf.setMapOutputValueClass(...) when it comes to configuring explicitly, the intermediate key and value types (which transform into a reducer's input). On Oct 16, 2010 9:14 PM, "Tri Doan" wrote: Saturday Hi Harsh J since i would use map function to emit a pair of (file id , content) that will be used in reduce function to combine all text content with same file, then extract only content only between , i thought if i can overwite Outputcollector by OutputCollector to achieve this goal. can i emit text object? or what should i do to emit object like text with file id then reduce will combine for further processing best regard Tri Doan 1429 Laramie Apt 3, Manhattan KS 66502 USA ----- Original Message ----- From: "Harsh J" To: common-user@hadoop.apache.org Sent: Saturday, October 16, 2010 8:23:08 AM Subject: Re: how to fic this error error Your mapper must emit an IntWritable as Value's type if you want to use that in your reducer. Right now you are emitting a Text object instead. On Oct 16, 2010 8:27 PM, "Tri Doan" wrote: Saturday i would liek to modify simple word count program so that i can produce text file from given html files ( by extracting text content only beween and and and . When i try to modify map and reduce task. it seems that i could not overwrite inwritable. the error is 10/10/16 09:07:18 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 10/10/16 09:07:18 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 10/10/16 09:07:18 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 10/10/16 09:07:18 INFO mapred.FileInputFormat: Total input paths to process : 20 10/10/16 09:07:19 INFO mapred.JobClient: Running job: job_local_0001 10/10/16 09:07:19 INFO mapred.FileInputFormat: Total input paths to process : 20 10/10/16 09:07:19 INFO mapred.MapTask: numReduceTasks: 1 10/10/16 09:07:19 INFO mapred.MapTask: io.sort.mb = 100 10/10/16 09:07:19 INFO mapred.MapTask: data buffer = 79691776/99614720 10/10/16 09:07:19 INFO mapred.MapTask: record buffer = 262144/327680 10/10/16 09:07:19 INFO mapred.MapTask: Starting flush of map output 10/10/16 09:07:20 INFO mapred.JobClient: map 0% reduce 0% 10/10/16 09:07:21 WARN mapred.LocalJobRunner: job_local_0001 java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritable <<<-------------------- at WordProcess$Reduce.reduce(WordProcess.java:44) at WordProcess$Reduce.reduce(WordProcess.java:1) at org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1151) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1265) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1129) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:359) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) 10/10/16 09:07:22 INFO mapred.JobClient: Job complete: job_local_0001 10/10/16 09:07:22 INFO mapred.JobClient: Counters: 0 Exception in thread "main" java.io.IOException: Job failed! <-------------------------------------- at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) at WordProcess.main(WordProcess.java:88) my code is import java.io.IOException; import java.util.*; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; import org.apache.hadoop.util.*; public class WordProcess { public static class Map extends MapReduceBase implements Mapper { private final static IntWritable one = new IntWritable(1); private Text id = new Text(); private Text value = new Text(); public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException { String line = value.toString(); FileSplit fileSplit = (FileSplit)reporter.getInputSplit(); String fileName = fileSplit.getPath().getName(); id.set(fileName); value.set(line); output.collect(id, value); } } public static class Reduce extends MapReduceBase implements Reducer { public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { int sum = 0; String str = ""; String substr1,substr2; Text text = new Text(); while (values.hasNext()) { String s = values.next().toString(); str = str.concat(s); } // locate tags and extract content int x1 = str.indexOf(""); int y1 = str.indexOf(""); substr1 = str.substring(x1+7,y1); int x2 = str.indexOf(""); int y2 = str.indexOf(""); substr2 = str.substring(x2+5,y2); str = substr1 +" "+ substr2; text.set(str); output.collect(key, text); System.out.println(key+","+text); } } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordProcess.class); conf.setJobName("wordprocess"); conf.setOutputKeyClass(Text.class); // conf.setOutputValueClass(IntWritable.class); conf.setOutputValueClass(Text.class); conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); // delete the output directory if it exists already FileSystem.get(conf).delete(new Path(args[1]), true); JobClient.runJob(conf); } } anyone have experience with this problem, pls tell me how to fix thank in advances best regard Tri Doan 1429 Laramie Apt 3, Manhattan KS 66502 USA --0016e65861ac682c580492be2662--