Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D6C7A105CD for ; Fri, 21 Mar 2014 11:09:37 +0000 (UTC) Received: (qmail 72640 invoked by uid 500); 21 Mar 2014 11:09:28 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 71838 invoked by uid 500); 21 Mar 2014 11:09:27 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 71823 invoked by uid 99); 21 Mar 2014 11:09:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Mar 2014 11:09:25 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ranjinibecse@gmail.com designates 209.85.217.175 as permitted sender) Received: from [209.85.217.175] (HELO mail-lb0-f175.google.com) (209.85.217.175) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Mar 2014 11:09:21 +0000 Received: by mail-lb0-f175.google.com with SMTP id w7so1540207lbi.34 for ; Fri, 21 Mar 2014 04:08:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=WkCY99IWdf/O8U8yPwQPD66HsuHXpowG4de28X7ID2Y=; b=YnaAhXgMl+DeY9XuCsk36n8zcEj7jS35tCF4XhXZJqs8hf/a7W+46f9iYSO5L+scfi hZk4Tbkrvb3avQRCQ6Ddbfd7MrcPRQpTqEkOl3XL7ClXmldZ4XF29uZ720+FxRnrjOOU vVFgboS7LjA1fa5DCF2EmLuff9AykqJXj0rKD4mVFop6oqw0IIAP9y/XQ/fZe29WwwOj VbGjRDLiCcd32bUhr5aruGEbmVCYIDFUZTHjYxGTNxwrmlRGKWdDi+1dioo/bZX9FQ2u +v4OFPY3v3GS+g3BLr7/klihKQMjusvXLKQ/Dv8ERPJ9xktSy9IXhE7/4iSNB2vU6Iav ZfiA== MIME-Version: 1.0 X-Received: by 10.112.46.225 with SMTP id y1mr31737310lbm.12.1395400139238; Fri, 21 Mar 2014 04:08:59 -0700 (PDT) Received: by 10.152.29.65 with HTTP; Fri, 21 Mar 2014 04:08:59 -0700 (PDT) In-Reply-To: References: Date: Fri, 21 Mar 2014 16:38:59 +0530 Message-ID: Subject: Re: Need FileName with Content From: Ranjini Rathinam To: user@hadoop.apache.org, sshi@gopivotal.com Content-Type: multipart/alternative; boundary=001a113494749e730404f51be92c X-Virus-Checked: Checked by ClamAV on apache.org --001a113494749e730404f51be92c Content-Type: text/plain; charset=ISO-8859-1 Hi, Thanks for the great support i have fixed the issue. I have now got the output. But , i have one query ,Possible to give runtime argument for mapper class like, Giving the value C,JAVA in runtime. * if((sp[k].equalsIgnoreCase("C"))){* while (itr.hasMoreTokens()) { word.set(pp.getName() + " " + itr.nextToken()); context.write(word, one); } } * if((sp[k].equalsIgnoreCase("JAVA"))){* while (itr.hasMoreTokens()) { word.set(pp.getName() + " " + itr.nextToken()); context.write(word, one); Thanks a lot . Ranjini On Fri, Mar 21, 2014 at 11:45 AM, Ranjini Rathinam wrote: > Hi, > > > Thanks a lot for the great support. I am just learning hadoop and > mapreduce. > > I have used the way you have guided me. > > But the output is coming without Aggreating > > vinitha.txt C 1 > vinitha.txt Java 1 > vinitha.txt Java 1 > vinitha.txt Java 1 > vinitha.txt Java 1 > > > *I need the output has * > > *vinitha C 1* > > *vinitha Java 4* > > > I have reduce class but still not able to fix it, I am still trying . > > I have given my code below, Please let me know where i have gone wrong. > > > my code > > > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.fs.Path; > import org.apache.hadoop.io.LongWritable; > import org.apache.hadoop.io.*; > import org.apache.hadoop.io.Text; > import org.apache.hadoop.mapreduce.InputSplit; > import org.apache.hadoop.mapreduce.lib.input.FileSplit; > import org.apache.hadoop.mapreduce.*; > import org.apache.hadoop.mapreduce.Job; > import org.apache.hadoop.mapreduce.Mapper; > import org.apache.hadoop.mapreduce.Reducer; > import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; > import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; > import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; > import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; > > import java.io.IOException; > import org.apache.hadoop.fs.Path; > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.fs.FileSystem; > import org.apache.hadoop.fs.FileStatus; > import java.util.*; > import java.util.logging.Level; > import java.util.logging.Logger; > > public class FileCount { > public static class TokenizerMapper extends Mapper Text, IntWritable> { > > > private final static IntWritable one = new IntWritable(1); > > private Text word = new Text(); > > > public void map(LongWritable key, Text value, Context context) throws > IOException, InterruptedException { > > FileSplit fileSplit; > InputSplit is = context.getInputSplit(); > FileSystem fs = FileSystem.get(context.getConfiguration()); > fileSplit = (FileSplit) is; > Path pp = fileSplit.getPath(); > String line=value.toString(); > int i=0;int k=0; > //Path pp = ((FileSplit) > context.getInputSplit()).getPath(); > > String[] splited = line.split("\\s+"); > for( i=0;i { > String sp[]=splited[i].split(","); > for( k=0;k { > > if(!sp[k].isEmpty()) > { > > StringTokenizer itr = new > StringTokenizer(sp[k]); > > //log.info("map on string: " + new > String(value.getBytes())); > > if((sp[k].equalsIgnoreCase("C"))){ > while (itr.hasMoreTokens()) { > word.set(pp.getName() + " " + > itr.nextToken()); > > context.write(word, one); > } > } > if((sp[k].equalsIgnoreCase("JAVA"))){ > while (itr.hasMoreTokens()) { > word.set(pp.getName() + " " + > itr.nextToken()); > > context.write(word, one); > } > } > } > } > } > > } > > } > > public static class Reduce extends Reducer IntWritable> { > > public void reduce(Text key, Iterator values, Context > context) throws IOException, InterruptedException { > > > int sum = 0; > while (values.hasNext()) { > sum += values.next().get(); > } > context.write(key, new IntWritable(sum)); > > } > } > public static void main(String[] args) throws Exception { > Configuration conf = new Configuration(); > Job job = new Job(conf, "jobName"); > > String input="/user/hduser/INPUT/"; > String output="/user/hduser/OUTPUT/"; > FileInputFormat.setInputPaths(job, input); > job.setJarByClass(FileCount.class); > job.setMapperClass(TokenizerMapper.class); > job.setReducerClass(Reduce.class); > job.setCombinerClass(Reduce.class); > job.setInputFormatClass(TextInputFormat.class); > job.setOutputKeyClass(Text.class); > job.setOutputValueClass(IntWritable.class); > Path outPath = new Path(output); > FileOutputFormat.setOutputPath(job, outPath); > FileSystem dfs = FileSystem.get(outPath.toUri(), conf); > if (dfs.exists(outPath)) { > dfs.delete(outPath, true); > } > > > try { > > job.waitForCompletion(true); > > } catch (InterruptedException ex) { > //Logger.getLogger(FileCOunt.class.getName()).log(Level.SEVERE, null, ex); > } catch (ClassNotFoundException ex) { > //Logger.getLogger(FileCount.class.getName()).log(Level.SEVERE, null, ex); > } > > } > > } > > > Thanks in advance for the great help and support to fix the issue . > > Please help to fix it. > > Thanks a lot. > > Regards, > Ranjini > > >> Hi, >> >> I have folder named INPUT. >> >> Inside INPUT i have 5 resume are there. >> >> hduser@localhost:~/Ranjini$ hadoop fs -ls /user/hduser/INPUT >> Found 5 items >> -rw-r--r-- 1 hduser supergroup 5438 2014-03-18 15:20 >> /user/hduser/INPUT/Rakesh Chowdary_Microstrategy.txt >> -rw-r--r-- 1 hduser supergroup 6022 2014-03-18 15:22 >> /user/hduser/INPUT/Ramarao Devineni_Microstrategy.txt >> -rw-r--r-- 1 hduser supergroup 3517 2014-03-18 15:21 >> /user/hduser/INPUT/vinitha.txt >> -rw-r--r-- 1 hduser supergroup 3517 2014-03-18 15:21 >> /user/hduser/INPUT/sony.txt >> -rw-r--r-- 1 hduser supergroup 3517 2014-03-18 15:21 >> /user/hduser/INPUT/ravi.txt >> hduser@localhost:~/Ranjini$ >> >> I have to process the folder and the content . >> >> I need ouput has >> >> filename word occurance >> vinitha java 4 >> sony oracle 3 >> >> >> >> But iam not getting the filename. Has the input file content are merged >> file name is not getting correct . >> >> >> please help in this issue to fix. I have given by code below >> >> >> import java.io.IOException; >> import java.util.*; >> import org.apache.hadoop.fs.Path; >> import org.apache.hadoop.conf.*; >> import org.apache.hadoop.io.*; >> import org.apache.hadoop.mapred.*; >> import org.apache.hadoop.util.*; >> import java.io.File; >> import java.io.FileReader; >> import java.io.FileWriter; >> import java.io.IOException; >> import org.apache.hadoop.fs.Path; >> import org.apache.hadoop.conf.Configuration; >> import org.apache.hadoop.fs.FileSystem; >> import org.apache.hadoop.fs.FileStatus; >> import org.apache.hadoop.conf.*; >> import org.apache.hadoop.io.*; >> import org.apache.hadoop.mapred.*; >> import org.apache.hadoop.util.*; >> import org.apache.hadoop.mapred.lib.*; >> >> public class WordCount { >> public static class Map extends MapReduceBase implements >> Mapper { >> private final static IntWritable one = new IntWritable(1); >> private Text word = new Text(); >> public void map(LongWritable key, Text value, OutputCollector> IntWritable> output, Reporter reporter) throws IOException { >> FSDataInputStream fs=null; >> FileSystem hdfs = null; >> String line = value.toString(); >> int i=0,k=0; >> try{ >> Configuration configuration = new Configuration(); >> configuration.set("fs.default.name", "hdfs://localhost:4440/"); >> >> Path srcPath = new Path("/user/hduser/INPUT/"); >> >> hdfs = FileSystem.get(configuration); >> FileStatus[] status = hdfs.listStatus(srcPath); >> fs=hdfs.open(srcPath); >> BufferedReader br=new BufferedReader(new >> InputStreamReader(hdfs.open(srcPath))); >> >> String[] splited = line.split("\\s+"); >> for( i=0;i> { >> String sp[]=splited[i].split(","); >> for( k=0;k> { >> >> if(!sp[k].isEmpty()){ >> StringTokenizer tokenizer = new StringTokenizer(sp[k]); >> if((sp[k].equalsIgnoreCase("C"))){ >> while (tokenizer.hasMoreTokens()) { >> word.set(tokenizer.nextToken()); >> output.collect(word, one); >> } >> } >> if((sp[k].equalsIgnoreCase("JAVA"))){ >> while (tokenizer.hasMoreTokens()) { >> word.set(tokenizer.nextToken()); >> output.collect(word, one); >> } >> } >> } >> } >> } >> } catch (IOException e) { >> e.printStackTrace(); >> } >> } >> } >> public static class Reduce extends MapReduceBase implements >> Reducer { >> public void reduce(Text key, Iterator values, >> OutputCollector output, Reporter reporter) throws >> IOException { >> int sum = 0; >> while (values.hasNext()) { >> sum += values.next().get(); >> } >> output.collect(key, new IntWritable(sum)); >> } >> } >> public static void main(String[] args) throws Exception { >> >> >> JobConf conf = new JobConf(WordCount.class); >> conf.setJobName("wordcount"); >> conf.setOutputKeyClass(Text.class); >> conf.setOutputValueClass(IntWritable.class); >> conf.setMapperClass(Map.class); >> conf.setCombinerClass(Reduce.class); >> conf.setReducerClass(Reduce.class); >> conf.setInputFormat(TextInputFormat.class); >> conf.setOutputFormat(TextOutputFormat.class); >> FileInputFormat.setInputPaths(conf, new Path(args[0])); >> FileOutputFormat.setOutputPath(conf, new Path(args[1])); >> JobClient.runJob(conf); >> } >> } >> >> >> >> Please help >> >> Thanks in advance. >> >> Ranjini >> >> >> >> ---------- >> From: *Stanley Shi* >> Date: Thu, Mar 20, 2014 at 7:39 AM >> To: user@hadoop.apache.org >> >> >> You want to do a word count for each file, but the code give you a word >> count for all the files, right? >> >> ===== >> word.set(tokenizer.nextToken()); >> output.collect(word, one); >> ====== >> change it to: >> word.set("filename"+" "+tokenizer.nextToken()); >> output.collect(word,one); >> >> >> >> >> Regards, >> *Stanley Shi,* >> >> >> ---------- >> From: *Ranjini Rathinam* >> Date: Thu, Mar 20, 2014 at 10:56 AM >> To: ranjini.r@polarisft.com >> >> >> >> ---------- >> From: *Ranjini Rathinam* >> Date: Thu, Mar 20, 2014 at 11:20 AM >> To: user@hadoop.apache.org, sshi@gopivotal.com >> >> >> Hi, >> >> If we give the below code, >> ======================= >> word.set("filename"+" "+tokenizer.nextToken()); >> output.collect(word,one); >> ====================== >> >> The output is wrong. because it shows the >> >> filename word occurance >> vinitha java 4 >> vinitha oracle 3 >> sony java 4 >> sony oracle 3 >> >> >> Here vinitha does not have oracle word . Similarlly sony does not have >> java has word. File name is merging for all words. >> >> I need the output has given below >> >> filename word occurance >> >> vinitha java 4 >> vinitha C++ 3 >> sony ETL 4 >> sony oracle 3 >> >> >> Need fileaName along with the word in that particular file only. No >> merge should happen. >> >> Please help me out for this issue. >> >> Please help. >> >> Thanks in advance. >> >> Ranjini >> >> ---------- >> From: *Felix Chern* >> Date: Thu, Mar 20, 2014 at 11:25 PM >> To: user@hadoop.apache.org >> Cc: sshi@gopivotal.com >> >> >> I've written two blog post of how to get directory context in hadoop >> mapper. >> >> >> http://www.idryman.org/blog/2014/01/26/capture-directory-context-in-hadoop-mapper/ >> >> http://www.idryman.org/blog/2014/01/27/capture-path-info-in-hadoop-inputformat-class/ >> >> Cheers, >> Felix >> >> ---------- >> From: *Stanley Shi* >> Date: Fri, Mar 21, 2014 at 7:02 AM >> >> To: Ranjini Rathinam >> Cc: user@hadoop.apache.org >> >> >> Just reviewed the code again, you are not really using map-reduce. you >> are reading all files in one map process, this is not a normal map-reduce >> job works. >> >> >> Regards, >> *Stanley Shi,* >> >> >> ---------- >> From: *Stanley Shi* >> Date: Fri, Mar 21, 2014 at 7:43 AM >> To: Ranjini Rathinam >> Cc: user@hadoop.apache.org >> >> >> Change you mapper to be something like this: >> >> public static class TokenizerMapper extends >> >> Mapper { >> >> >> private final static IntWritable one = new IntWritable(1); >> >> private Text word = new Text(); >> >> >> public void map(Object key, Text value, Context context) >> >> throws IOException, InterruptedException { >> >> Path pp = ((FileSplit) context.getInputSplit()).getPath(); >> >> StringTokenizer itr = new StringTokenizer(value.toString()); >> >> log.info("map on string: " + new String(value.getBytes())); >> >> while (itr.hasMoreTokens()) { >> >> word.set(pp.getName() + " " + itr.nextToken()); >> >> context.write(word, one); >> >> } >> >> } >> >> } >> >> Note: add your filtering code here; >> >> and then when running the command, use you input path as param; >> >> Regards, >> *Stanley Shi,* >> >> >> ---------- >> From: *Ranjini Rathinam* >> Date: Fri, Mar 21, 2014 at 9:57 AM >> To: ranjini.r@polarisft.com >> >> >> >> >> ---------- Forwarded message ---------- >> From: Stanley Shi >> Date: Fri, Mar 21, 2014 at 7:43 AM >> Subject: Re: Need FileName with Content >> >> >> ---------- >> From: *Ranjini Rathinam* >> Date: Fri, Mar 21, 2014 at 9:58 AM >> To: ranjini.r@polarisft.com >> >> >> >> >> > --001a113494749e730404f51be92c Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi,
=A0
Thanks for the great support i have fixed the issue. I have now got th= e=A0output.
=A0
But , i have one query ,Possible to give runtime argument=A0for mapper= class
=A0
like,
=A0
Giving the value C,JAVA in runtime.
=A0
=A0
=A0if((sp[k].equalsIgnoreCase("C"))){
=A0=A0=A0=A0=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0= =A0 =A0=A0=A0 =A0=A0=A0 while (itr.hasMoreTokens()) {
=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 = =A0=A0=A0 =A0=A0=A0 word.set(pp.getName() + " " + itr.nextToken()= );

=A0=A0=A0=A0=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0= =A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 context.write(word, one);
=A0=A0=A0=A0=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0= =A0=A0 =A0=A0=A0 =A0=A0=A0 }
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0= =A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 }
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 = =A0=A0=A0 if((sp[k].equalsIgnoreCase("JAVA"))){
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 = =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0while (itr.hasMoreTokens()) {
=A0=A0=A0= =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0= =A0=A0 =A0=A0=A0 =A0=A0 word.set(pp.getName() + " " + itr.nextTok= en());

=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 = =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 context.write(word, one);
=A0
=A0Thanks a lot .
=A0
Ranjini


=A0
On Fri, Mar 21, 2014 at 11:45 AM, Ranjini Rathin= am <ranjinibecse@gmail.com> wrote:
Hi,
=A0
=A0
Thanks a lot for the great support. I am just learning hadoop and mapr= educe.
=A0
I have used the way you have guided me.
=A0
But the output is coming without Aggreating
=A0
vinitha.txt C=A0=A0=A0 1
vinitha.txt Java=A0=A0=A0 1
vinit= ha.txt Java=A0=A0=A0 1
vinitha.txt Java=A0=A0=A0 1
vinitha.txt Java= =A0=A0=A0 1

=A0

I need the output has
=A0
vinitha=A0=A0=A0=A0=A0 =A0C=A0=A0=A0 1

vinitha=A0=A0=A0=A0=A0 Java=A0=A04


I have reduce class but still not able to fix it, I am still t= rying .
=A0
I have given my code below, Please let me know where i have gone wrong= .
=A0
=A0
my code
=A0
=A0
import org.apache.hadoop.conf.Configuration;
import org.apache.hado= op.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apac= he.hadoop.io.*;
import org.apache.hadoop.io.Text;
import org.apache.h= adoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;
import org.apach= e.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.Job;
import = org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.R= educer;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org= .apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.ha= doop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.map= reduce.lib.output.TextOutputFormat;=20

import java.io.IOException;
import org.apache.hadoop.fs.Path;import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.= fs.FileSystem;
import org.apache.hadoop.fs.FileStatus;
import j= ava.util.*;
import java.util.logging.Level;
import java.util.logging.Logger;

= =A0public class FileCount {
=A0=A0=A0 public static class TokenizerMappe= r extends Mapper<LongWritable, Text, Text, IntWritable> {=20


=A0=A0=A0 private final static IntWritable one =3D new IntWrit= able(1);

=A0=A0=A0 private Text word =3D new Text();


=A0=A0=A0 public void map(LongWritable key, Text value, Context context) = throws IOException, InterruptedException {
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 F= ileSplit fileSplit;
=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 InputSplit is =3D = context.getInputSplit();
=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 FileSystem fs= =3D FileSystem.get(context.getConfiguration());
=A0 =A0=A0=A0 =A0=A0=A0= =A0=A0=A0 fileSplit =3D (FileSplit) is;
=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 Path pp =3D fileSplit.getPath();
=A0= =A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 String line=3Dvalue.toString= ();
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 int i=3D0;int k=3D= 0;
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 //Path pp =3D ((Fil= eSplit) context.getInputSplit()).getPath();=A0=A0=A0
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0
=A0=A0=A0 =A0=A0= =A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 String[] splited =3D line.split("\\s= +");
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 fo= r( i=3D0;i<splited.length;i++)
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0= =A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 {
=A0=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0= =A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 String sp[]=3Dsplited[i].spl= it(",");
=A0=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 for( k=3D0;k= <sp.length;k++)
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0= =A0=A0 =A0=A0=A0 {
=A0=A0 =A0=A0=A0 =A0=A0=A0
=A0=A0 =A0=A0=A0 =A0= =A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 if(!sp[k].isEmpty(= ))
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0= {=A0=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 = =A0=A0=A0

=A0=A0=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0= =A0=A0=A0 =A0=A0=A0 StringTokenizer itr =3D new StringTokenizer(sp[k]);
=A0=A0=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0= =A0=A0 =A0=A0=A0 //log.info<= /a>("map on string: " + new String(value.getBytes()));=20

=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0= =A0 =A0=A0=A0 if((sp[k].equalsIgnoreCase("C"))){
=A0=A0=A0=A0=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0= =A0 =A0=A0=A0 =A0=A0=A0 while (itr.hasMoreTokens()) {
=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 = =A0=A0=A0 =A0=A0=A0 word.set(pp.getName() + " " + itr.nextToken()= );

=A0=A0=A0=A0=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0= =A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 context.write(word, one);
=A0=A0=A0=A0=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0= =A0=A0 =A0=A0=A0 =A0=A0=A0 }
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0= =A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 }
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 = =A0=A0=A0 if((sp[k].equalsIgnoreCase("JAVA"))){
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 = =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0while (itr.hasMoreTokens()) {
=A0=A0=A0= =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0= =A0=A0 =A0=A0=A0 =A0=A0 word.set(pp.getName() + " " + itr.nextTok= en());

=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 = =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 context.write(word, one);
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0= =A0=A0 =A0=A0=A0 =A0=A0=A0 }
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0= =A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 }
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0= =A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0}
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 = =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 }
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 = =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 }=A0=A0=A0=A0=A0

=A0=A0=A0=A0=A0=A0= =A0=A0=A0 }

=A0 }

=A0 public static class Reduce extends Reducer<Text, IntWritable, Text, = IntWritable> {

=A0=A0=A0 public void reduce(Text key, Iterator<= ;IntWritable> values, Context context) throws IOException, InterruptedEx= ception {=20

=A0=A0=A0
=A0=A0=A0=A0=A0=A0=A0 int sum =3D 0;
=A0=A0=A0=A0= =A0=A0=A0 while (values.hasNext()) {
=A0=A0=A0=A0=A0=A0=A0=A0=A0 sum += =3D values.next().get();
=A0=A0=A0=A0=A0=A0=A0 }
=A0=A0=A0=A0= =A0=A0 context.write(key, new IntWritable(sum));=20

=A0=A0=A0=A0=A0 }
=A0=A0=A0 }
=A0=A0=A0 public static void m= ain(String[] args) throws Exception {
=A0=A0=A0 =A0=A0=A0 =A0=A0= =A0 Configuration conf =3D new Configuration();
Job job =3D new Job(conf= , "jobName");

String input=3D"/user/hduser/INPUT/&quo= t;;
String output=3D"/user/hduser/OUTPUT/";
FileInputFormat.setInp= utPaths(job, input);
job.setJarByClass(FileCount.class);
job.setMappe= rClass(TokenizerMapper.class);
job.setReducerClass(Reduce.class);
job= .setCombinerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputKeyClass(Te= xt.class);
job.setOutputValueClass(IntWritable.class);
Path outPath = =3D new Path(output);
FileOutputFormat.setOutputPath(job, outPath);
FileSystem dfs =3D FileSystem.get(outPath.toUri(), conf);
if (dfs.exists= (outPath)) {
dfs.delete(outPath, true);
}


try {

job= .waitForCompletion(true);

} catch (InterruptedException ex) {
//L= ogger.getLogger(FileCOunt.class.getName()).log(Level.SEVERE, null, ex);
} catch (ClassNotFoundException ex) {
//Logger.getLogger(FileCount.class= .getName()).log(Level.SEVERE, null, ex);
}

}

}
=A0
=A0
Thanks in advance for=A0the great help and support to fix the issue .<= /div>
=A0
Please help to fix it.
=A0
Thanks a lot.
=A0
Regards,
Ranjini


Hi,

I have folder named INPUT.

Inside INPUT i have= 5 resume are there.

hduser@localhost:~/Ranjini$ hadoop fs -ls /user= /hduser/INPUT
Found 5 items
-rw-r--r--=A0=A0 1 hduser supergroup=A0= =A0=A0=A0=A0=A0 5438 2014-03-18 15:20 /user/hduser/INPUT/Rakesh Chowdary_Mi= crostrategy.txt
-rw-r--r--=A0=A0 1 hduser supergroup=A0=A0=A0=A0=A0=A0 6022 2014-03-18 15:2= 2 /user/hduser/INPUT/Ramarao Devineni_Microstrategy.txt
-rw-r--r--=A0=A0= 1 hduser supergroup=A0=A0=A0=A0=A0=A0 3517 2014-03-18 15:21 /user/hduser/I= NPUT/vinitha.txt
-rw-r--r--=A0=A0 1 hduser supergroup=A0=A0=A0=A0=A0=A0 = 3517 2014-03-18 15:21 /user/hduser/INPUT/sony.txt
-rw-r--r--=A0=A0 1 hduser supergroup=A0=A0=A0=A0=A0=A0 3517 2014-03-18 15:2= 1 /user/hduser/INPUT/ravi.txt
hduser@localhost:~/Ranjini$

I have= to process the folder and the content .

I need ouput has

filename=A0=A0 word=A0=A0 occurance
vinitha=A0=A0=A0=A0=A0=A0 java= =A0=A0=A0=A0=A0=A0 4
sony=A0=A0=A0=A0=A0=A0=A0=A0=A0 oracle=A0=A0=A0=A0= =A0 3



But iam not getting the filename.=A0 Has the input fil= e content are merged file name is not getting correct .


please help in this issue to fix.=A0 I have given by code below
=
=A0
=A0
=A0import java.io.IOException;
=A0import java.util.*;
=A0i= mport org.apache.hadoop.fs.Path;
=A0import org.apache.hadoop.conf.*;
= =A0import org.apache.hadoop.io.*;
=A0import org.apache.hadoop.mapred.*;<= br>=A0import org.apache.hadoop.util.*;
import java.io.File;
import java.io.FileReader;
import java.io.FileWr= iter;
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.co= nf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.= apache.hadoop.fs.FileStatus;
import org.apache.hadoop.conf.*;
import = org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
i= mport org.apache.hadoop.mapred.lib.*;

=A0public class WordCount {
=A0=A0=A0 public static class Map e= xtends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWrit= able> {
=A0=A0=A0=A0 private final static IntWritable one =3D new Int= Writable(1);
=A0=A0=A0=A0=A0 private Text word =3D new Text();
=A0=A0=A0=A0=A0 p= ublic void map(LongWritable key, Text value, OutputCollector<Text, IntWr= itable> output, Reporter reporter) throws IOException {
=A0=A0=A0FSDa= taInputStream fs=3Dnull;
=A0=A0=A0FileSystem hdfs =3D null;
=A0=A0=A0String line =3D value.toStri= ng();
=A0=A0=A0=A0=A0=A0=A0 =A0int i=3D0,k=3D0;
if((sp[k].equalsIgnoreCase("C"))){
=A0=A0=A0=A0=A0=A0=A0 = while (tokenizer.hasMoreTokens()) {
=A0=A0=A0=A0=A0=A0=A0=A0=A0 word.set= (tokenizer.nextToken());
=A0=A0=A0=A0=A0=A0=A0=A0=A0 output.collect(word, one);
=A0=A0=A0=A0= =A0=A0=A0 }
}
if((sp[k].equalsIgnoreCase("JAVA"))){
=A0= =A0=A0=A0=A0=A0=A0 while (tokenizer.hasMoreTokens()) {
=A0=A0=A0=A0=A0= =A0=A0=A0=A0 word.set(tokenizer.nextToken());
=A0=A0=A0=A0=A0=A0=A0=A0=A0 output.collect(word, one);
=A0=A0=A0=A0= =A0=A0=A0 }
}
=A0=A0=A0=A0=A0 }
=A0=A0=A0 }
}
=A0} catch (IOException e) {
=A0=A0=A0=A0e.printStackTrace();
= =A0}=A0
}
}
=A0=A0=A0 public static class Reduce extends MapReduce= Base implements Reducer<Text, IntWritable, Text, IntWritable> {
= =A0=A0=A0=A0=A0 public void reduce(Text key, Iterator<IntWritable> va= lues, OutputCollector<Text, IntWritable> output, Reporter reporter) t= hrows IOException {
=A0=A0=A0=A0=A0=A0=A0 int sum =3D 0;
=A0=A0=A0=A0=A0=A0=A0 while (values= .hasNext()) {
=A0=A0=A0=A0=A0=A0=A0=A0=A0 sum +=3D values.next().get();<= br>=A0=A0=A0=A0=A0=A0=A0 }
=A0=A0=A0=A0=A0=A0=A0 output.collect(key, new= IntWritable(sum));
=A0=A0=A0=A0=A0 }
=A0=A0=A0 }
=A0=A0=A0 public= static void main(String[] args) throws Exception {
=A0
=A0
=A0=A0=A0=A0=A0 JobConf conf =3D new JobConf(WordCount.class)= ;
=A0=A0=A0=A0=A0 conf.setJobName("wordcount");
=A0=A0=A0= =A0=A0 conf.setOutputKeyClass(Text.class);
=A0=A0=A0=A0=A0 conf.setOutpu= tValueClass(IntWritable.class);
=A0=A0=A0=A0=A0 conf.setMapperClass(Map.= class);
=A0=A0=A0=A0=A0 conf.setCombinerClass(Reduce.class);
=A0=A0=A0=A0=A0 con= f.setReducerClass(Reduce.class);
=A0=A0=A0=A0=A0 conf.setInputFormat(Tex= tInputFormat.class);
=A0=A0=A0=A0=A0 conf.setOutputFormat(TextOutputForm= at.class);
=A0=A0=A0=A0=A0 FileInputFormat.setInputPaths(conf, new Path(= args[0]));
=A0=A0=A0=A0=A0 FileOutputFormat.setOutputPath(conf, new Path(args[1]));=A0=A0=A0=A0=A0 JobClient.runJob(conf);
=A0=A0=A0 }
=A0}
=A0=20

=A0

Please help
=A0
Thanks in advance.
=A0
Ranjini



----------
From: Stanley Shi <sshi@gopivotal.com>
Date:= Thu, Mar 20, 2014 at 7:39 AM
To: user@hadoop.apache.org


You want to do a word count for each file, but the code gi= ve you a word count for all the files, right?=20

=3D=3D=3D=3D=3D
word.set(tokeniz= er.nextToken());
=A0=A0=A0=A0=A0= =A0=A0=A0=A0 output.collect(word, one);
=3D=3D=3D=3D=3D= =3D
change it to:
word.set("f= ilename"+" =A0 =A0"+tokenizer.nextToken());
output.collect(w= ord,one);




Regards,
Stanley Shi,


----------
From: <= b>Ranjini Rathinam <ranjinibecse@gmail.com>
Date= : Thu, Mar 20, 2014 at 10:56 AM
To: ranjini.r@= polarisft.com



----------
From: Ranjini Ra= thinam <ranjinibecse@gmail.com>
Date: Thu, Mar 2= 0, 2014 at 11:20 AM
To: user@hadoop= .apache.org, ss= hi@gopivotal.com


Hi,
=A0
If we give the below code,
=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
word.set("f= ilename"+" =A0 =A0"+tokenizer.nextToken());
output.collect(w= ord,one);
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
=
=A0
The output is wrong. because it shows the
=A0
filename=A0=A0 word=A0=A0 occurance
vinitha=A0=A0=A0=A0=A0=A0 java= =A0=A0=A0=A0=A0=A0 4
vinitha=A0=A0=A0=A0=A0=A0=A0=A0 oracle=A0=A0= =A0=A0=A0 3
sony=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 java=A0=A0=A0=A0=A0=A0 4
sony=A0= =A0=A0=A0=A0=A0=A0=A0=A0 oracle=A0=A0=A0=A0=A0 3
=A0
=A0
Here vinitha does not have oracle word . Similarlly sony does not have= java has word. File name is merging for=A0 all words.
=A0
I need the output has given below
=A0
filename=A0=A0 word=A0=A0 occurance

vinitha=A0=A0=A0=A0=A0=A0 java=A0=A0=A0=A0=A0=A0 4
vinitha=A0= =A0=A0=A0=A0=A0=A0=A0 C++=A0=A0=A0 3
sony=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 ETL=A0=A0=A0=A0 4
sony=A0=A0=A0= =A0=A0=A0=A0=A0=A0 oracle=A0=A0=A0=A0=A0 3
=A0
=A0
=A0Need fileaName along with the word in that particular file only. No= merge should happen.
=A0
Please help me out for this issue.
=A0
Please help.
=A0
Thanks in advance.
=A0
Ranjini

----------
From: Feli= x Chern <idryman@gmail.com>
Date: Thu, Mar 20, 2014 a= t 11:25 PM
To: user@hadoop= .apache.org
Cc: sshi@gopivotal.com

----------
From: <= b>Stanley Shi <sshi@gopivotal.com>
Date: Fri, Mar 21= , 2014 at 7:02 AM=20

To: Ranjini Rathinam <ranjinibecse@gmail.com>
Cc: user@hadoop.apache.org


Just reviewed the code again, you are not really using map= -reduce. you are reading all files in one map process, this is not a normal= map-reduce job works.=20


Regards,
Stanley Shi,


----------
From: Stanley Shi <sshi@gopivotal.com>
Date:= Fri, Mar 21, 2014 at 7:43 AM
<= /font>

Change you mapper to be something like this:=20

public static class TokenizerMapper extends

=A0 =A0 =A0 Mapper<Object, Text, Text, IntWritable> {


=A0 =A0 private final static IntW= ritable one =3D new IntWritable(1);

=A0 =A0 private Text word =3D new= Text();


=A0 =A0 public void map(Object key, Text value= , Context context)

=A0 =A0 =A0 =A0 throws IOException, InterruptedException {<= /p>

=A0 =A0 =A0 Path pp =3D ((FileSplit) context.getInputSplit()).getPath();=

=A0 =A0 =A0 StringTokenizer itr =3D new StringTokenizer(val= ue.toString());

=A0 =A0 =A0 log.info("map on string: " + new String(value.getBytes()));

=A0 =A0 =A0 while (itr.hasMoreTokens()) {

=A0 =A0 =A0 =A0 word.set(pp.getName() + " "= + itr.nextToken());

=A0 =A0 =A0 =A0 context.write(word, one);

=A0 =A0 =A0 }

=A0 =A0 }

=A0 }

Note: add your filtering code here;

and then when running the command, use you input path as param;


Regards,
Stanley Shi,


----------
From: Ranjini Rathinam <ranjinibecse@gmail.com>
Date: Fri, Mar 21, 2014 at 9:57 AM
To: ranjini.r@polarisft.com




---------- Forwarded message ----------
From: Stanley Shi <sshi@gopivotal.com>
Date: Fri, Mar 21, 2014 at 7:43 AM
Subject: Re: Need FileName with = Content


----------
Fro= m: Ranjini Rathinam <ranjinibecse@gmail.com>
= Date: Fri, Mar 21, 2014 at 9:58 AM
To: ranjini.r@= polarisft.com






--001a113494749e730404f51be92c--