Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of qwertymaniac@gmail.com
 designates 209.85.161.48 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=psvBI+Z8oNVla2MPO9qD40EDnH1bMZ1yQ8eogOIPNW0GK+TCeL8etMaOPlEUazF2Ab
         KRF/Qes7r+itJIVuFEOfzhGrn9a/BtqfY2sf/2mBd7ulUwR13HwY1QCnAq808taryWDy
         hMkQNjUyBQjvsHRGrwzEhUfpRADrNCORJCwFg=
MIME-Version: 1.0
In-Reply-To: 
 <626775452.376419.1287243821573.JavaMail.root@ksu-mailstore03.merit.edu>
References: <AANLkTinPCT1ZYzET_4OvXgqxBYr+2MykGvWwV6+Y0AoL@mail.gmail.com>
	<626775452.376419.1287243821573.JavaMail.root@ksu-mailstore03.merit.edu>
Date: Sat, 16 Oct 2010 21:38:12 +0530
Message-ID: <AANLkTik8P+5fXH3FKAhnJJ4ktTnbZFN3q3FSkVEUwYZs@mail.gmail.com>
Subject: Re: pls clarify on this
From: Harsh J <qwertymaniac@gmail.com>
To: common-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=0016e65861ac682c580492be2662

--0016e65861ac682c580492be2662
Content-Type: text/plain; charset=ISO-8859-1

You can emit whichever Writable you like, but as per your given code your
Reducer class (the class definition line specifically) is looking for an
IntWritable in the value's iterable. Change that to Text and it should do
what you expect :)

For reference, look into JobConf.setMapOutputKeyClass(...) and
JobConf.setMapOutputValueClass(...) when it comes to configuring explicitly,
the intermediate key and value types (which transform into a reducer's
input).

On Oct 16, 2010 9:14 PM, "Tri Doan" <tridoan@k-state.edu> wrote:

Saturday
Hi Harsh J

since i would use map function to emit  a pair of (file id , content) that
will be used in reduce function to combine all text content with same file,
then extract only content only between <title> </title> , <text> </text>
i thought if i can overwite Outputcollector <text,inwritable> by
OutputCollector<text, text> to achieve this goal.

can i emit text object? or what should i do to emit object like text with
file id then reduce will combine for further processing

best regard

Tri Doan
1429 Laramie Apt 3, Manhattan
KS 66502
USA

----- Original Message -----
From: "Harsh J" <qwertymaniac@gmail.com>
To: common-user@hadoop.apache.org
Sent: Saturday, October 16, 2010 8:23:08 AM
Subject: Re: how to fic this error error

Your mapper must emit an IntWritable as Value's type if you want to use that
in your reducer. Right now you are emitting a Text object instead.

On Oct 16, 2010 8:27 PM, "Tri Doan" <tridoan@k-state.edu> wrote:

Saturday

i would liek to modify simple word count program so that i can produce text
file from given html files ( by extracting text content only beween <title>
and </title> and <text> and </text> . When i try to modify map and reduce
task. it seems that i could not overwrite inwritable. the error is
10/10/16 09:07:18 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
10/10/16 09:07:18 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
10/10/16 09:07:18 WARN mapred.JobClient: No job jar file set.  User classes
may not be found. See JobConf(Class) or JobConf#setJar(String).
10/10/16 09:07:18 INFO mapred.FileInputFormat: Total input paths to process
: 20
10/10/16 09:07:19 INFO mapred.JobClient: Running job: job_local_0001
10/10/16 09:07:19 INFO mapred.FileInputFormat: Total input paths to process
: 20
10/10/16 09:07:19 INFO mapred.MapTask: numReduceTasks: 1
10/10/16 09:07:19 INFO mapred.MapTask: io.sort.mb = 100
10/10/16 09:07:19 INFO mapred.MapTask: data buffer = 79691776/99614720
10/10/16 09:07:19 INFO mapred.MapTask: record buffer = 262144/327680
10/10/16 09:07:19 INFO mapred.MapTask: Starting flush of map output
10/10/16 09:07:20 INFO mapred.JobClient:  map 0% reduce 0%
10/10/16 09:07:21 WARN mapred.LocalJobRunner: job_local_0001
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
org.apache.hadoop.io.IntWritable   <<<--------------------
      at WordProcess$Reduce.reduce(WordProcess.java:44)
      at WordProcess$Reduce.reduce(WordProcess.java:1)
      at
org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1151)
      at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1265)
      at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1129)
      at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:359)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
      at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
10/10/16 09:07:22 INFO mapred.JobClient: Job complete: job_local_0001
10/10/16 09:07:22 INFO mapred.JobClient: Counters: 0
Exception in thread "main" java.io.IOException: Job failed!
<--------------------------------------
      at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
      at WordProcess.main(WordProcess.java:88)


my code is


import java.io.IOException;
import java.util.*;

import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;


public class WordProcess {

 public static class Map extends MapReduceBase implements
Mapper<LongWritable, Text, Text, Text> {
  private final static IntWritable one = new IntWritable(1);
  private Text id = new Text();

  private Text value = new Text();

  public void map(LongWritable key, Text value, OutputCollector<Text, Text>
output, Reporter reporter) throws IOException {
      String line = value.toString();

      FileSplit fileSplit = (FileSplit)reporter.getInputSplit();
      String fileName = fileSplit.getPath().getName();

      id.set(fileName);
      value.set(line);
      output.collect(id, value);

  }
 }

 public static class Reduce extends MapReduceBase implements Reducer<Text,
IntWritable, Text, Text> {

  public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
      int sum = 0;

      String str = "";
      String substr1,substr2;
      Text text = new Text();
      while (values.hasNext())
      {
          String s = values.next().toString();
          str = str.concat(s);

      }
      // locate tags and extract content
      int x1 = str.indexOf("<TITLE>");
      int y1 = str.indexOf("</TITLE>");
      substr1 = str.substring(x1+7,y1);

      int x2 = str.indexOf("<TEXT>");
      int y2 = str.indexOf("</TEXT>");
      substr2 = str.substring(x2+5,y2);

      str = substr1 +" "+ substr2;

      text.set(str);
      output.collect(key, text);
      System.out.println(key+","+text);

  }
 }

 public static void main(String[] args) throws Exception {
  JobConf conf = new JobConf(WordProcess.class);
  conf.setJobName("wordprocess");

  conf.setOutputKeyClass(Text.class);
 // conf.setOutputValueClass(IntWritable.class);
  conf.setOutputValueClass(Text.class);

  conf.setMapperClass(Map.class);

  conf.setCombinerClass(Reduce.class);
  conf.setReducerClass(Reduce.class);

  conf.setInputFormat(TextInputFormat.class);
  conf.setOutputFormat(TextOutputFormat.class);

  FileInputFormat.setInputPaths(conf, new Path(args[0]));
  FileOutputFormat.setOutputPath(conf, new Path(args[1]));

  // delete the output directory if it exists already
  FileSystem.get(conf).delete(new Path(args[1]), true);

  JobClient.runJob(conf);
 }

}


anyone have experience with this problem, pls tell me how to fix
thank in advances

best regard
Tri Doan
1429 Laramie Apt 3, Manhattan
KS 66502
USA

--0016e65861ac682c580492be2662--