Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of ranjinibecse@gmail.com
 designates 209.85.217.175 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CALGZSrCrL+8zy1MrBwwJH2R0e6nv=aWF_+fbrSCqcgRe2n0UwA@mail.gmail.com>
References: 
 <CALGZSrCdt61zV6kv7VpMMwx8chtaXTh+CNSeydApCvFg0X0-ew@mail.gmail.com>
	<CAOAr05v-akGWgmeRdt=DvC25A61vhijjkN=B2u8=7mAJsmLSCw@mail.gmail.com>
	<CALGZSrAj5m==7z2x1VD0FD9KYCdWaPTZ4dbmAmZ46qwAwxKjZg@mail.gmail.com>
	<CALGZSrBH+YLVvGdbeB5ZADhRvEA8PcjqGRa1ivbA_2OHAZxobA@mail.gmail.com>
	<CAOAr05vR6rGeDSPbdtoPDmxHdrD5er9ZBMt2aU9ZCuYyWkV9fg@mail.gmail.com>
	<CAOAr05tUYeUBman8APmS59tu+aKEWb+TEm9f+w+DVqYxhHoCNw@mail.gmail.com>
	<CALGZSrDvoS_TW1vJGk7QiwkPZ277ENrdzyNTt+c4ZRn8oMk6-A@mail.gmail.com>
	<CALGZSrBdn=_M87eaZm8mBkMPHB77w5bs0BJ8KrCLuZpWA=wAPA@mail.gmail.com>
	<CALGZSrCKEyQrgk9vXOHUG6E9DmfkHsjyAVt9BuMaTcnc_VnVZA@mail.gmail.com>
	<CALGZSrCrL+8zy1MrBwwJH2R0e6nv=aWF_+fbrSCqcgRe2n0UwA@mail.gmail.com>
Date: Fri, 21 Mar 2014 16:38:59 +0530
Message-ID: 
 <CALGZSrA6XMjOnF4+V7HU_iUzELMBG4XW5=+g4pdDG_qd0RLMFA@mail.gmail.com>
Subject: Re: Need FileName with Content
From: Ranjini Rathinam <ranjinibecse@gmail.com>
To: user@hadoop.apache.org, sshi@gopivotal.com
Content-Type: multipart/alternative; boundary=001a113494749e730404f51be92c

--001a113494749e730404f51be92c
Content-Type: text/plain; charset=ISO-8859-1

Hi,

Thanks for the great support i have fixed the issue. I have now got
the output.

But , i have one query ,Possible to give runtime argument for mapper class

like,

Giving the value C,JAVA in runtime.


* if((sp[k].equalsIgnoreCase("C"))){*
                                    while (itr.hasMoreTokens()) {
                                           word.set(pp.getName() + " " +
itr.nextToken());

                                        context.write(word, one);
                                        }
                                    }

*    if((sp[k].equalsIgnoreCase("JAVA"))){*
                                         while (itr.hasMoreTokens()) {
                                           word.set(pp.getName() + " " +
itr.nextToken());

                                        context.write(word, one);

 Thanks a lot .

Ranjini


On Fri, Mar 21, 2014 at 11:45 AM, Ranjini Rathinam
<ranjinibecse@gmail.com>wrote:

> Hi,
>
>
> Thanks a lot for the great support. I am just learning hadoop and
> mapreduce.
>
> I have used the way you have guided me.
>
> But the output is coming without Aggreating
>
> vinitha.txt C    1
> vinitha.txt Java    1
> vinitha.txt Java    1
> vinitha.txt Java    1
> vinitha.txt Java    1
>
>
> *I need the output has *
>
>  *vinitha       C    1*
>
> *vinitha      Java  4*
>
>
> I have reduce class but still not able to fix it, I am still trying .
>
> I have given my code below, Please let me know where i have gone wrong.
>
>
> my code
>
>
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.io.LongWritable;
> import org.apache.hadoop.io.*;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.mapreduce.InputSplit;
> import org.apache.hadoop.mapreduce.lib.input.FileSplit;
> import org.apache.hadoop.mapreduce.*;
> import org.apache.hadoop.mapreduce.Job;
> import org.apache.hadoop.mapreduce.Mapper;
> import org.apache.hadoop.mapreduce.Reducer;
> import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
> import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
>
> import java.io.IOException;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.FileSystem;
> import org.apache.hadoop.fs.FileStatus;
> import java.util.*;
> import java.util.logging.Level;
> import java.util.logging.Logger;
>
>  public class FileCount {
>     public static class TokenizerMapper extends Mapper<LongWritable, Text,
> Text, IntWritable> {
>
>
>     private final static IntWritable one = new IntWritable(1);
>
>     private Text word = new Text();
>
>
>     public void map(LongWritable key, Text value, Context context) throws
> IOException, InterruptedException {
>
>             FileSplit fileSplit;
>               InputSplit is = context.getInputSplit();
>               FileSystem fs = FileSystem.get(context.getConfiguration());
>               fileSplit = (FileSplit) is;
>               Path pp = fileSplit.getPath();
>                     String line=value.toString();
>                     int i=0;int k=0;
>                     //Path pp = ((FileSplit)
> context.getInputSplit()).getPath();
>
>                     String[] splited = line.split("\\s+");
>                         for( i=0;i<splited.length;i++)
>                             {
>                                  String sp[]=splited[i].split(",");
>                          for( k=0;k<sp.length;k++)
>                             {
>
>                                if(!sp[k].isEmpty())
>                             {
>
>                                   StringTokenizer itr = new
> StringTokenizer(sp[k]);
>
>                                   //log.info("map on string: " + new
> String(value.getBytes()));
>
>                                 if((sp[k].equalsIgnoreCase("C"))){
>                                     while (itr.hasMoreTokens()) {
>                                            word.set(pp.getName() + " " +
> itr.nextToken());
>
>                                         context.write(word, one);
>                                         }
>                                     }
>                                 if((sp[k].equalsIgnoreCase("JAVA"))){
>                                          while (itr.hasMoreTokens()) {
>                                            word.set(pp.getName() + " " +
> itr.nextToken());
>
>                                         context.write(word, one);
>                                         }
>                                 }
>                              }
>                             }
>                         }
>
>           }
>
>   }
>
>   public static class Reduce extends Reducer<Text, IntWritable, Text,
> IntWritable> {
>
>     public void reduce(Text key, Iterator<IntWritable> values, Context
> context) throws IOException, InterruptedException {
>
>
>         int sum = 0;
>         while (values.hasNext()) {
>           sum += values.next().get();
>         }
>        context.write(key, new IntWritable(sum));
>
>       }
>     }
>     public static void main(String[] args) throws Exception {
>             Configuration conf = new Configuration();
> Job job = new Job(conf, "jobName");
>
> String input="/user/hduser/INPUT/";
> String output="/user/hduser/OUTPUT/";
> FileInputFormat.setInputPaths(job, input);
> job.setJarByClass(FileCount.class);
> job.setMapperClass(TokenizerMapper.class);
> job.setReducerClass(Reduce.class);
> job.setCombinerClass(Reduce.class);
> job.setInputFormatClass(TextInputFormat.class);
> job.setOutputKeyClass(Text.class);
> job.setOutputValueClass(IntWritable.class);
> Path outPath = new Path(output);
> FileOutputFormat.setOutputPath(job, outPath);
> FileSystem dfs = FileSystem.get(outPath.toUri(), conf);
> if (dfs.exists(outPath)) {
> dfs.delete(outPath, true);
> }
>
>
> try {
>
> job.waitForCompletion(true);
>
> } catch (InterruptedException ex) {
> //Logger.getLogger(FileCOunt.class.getName()).log(Level.SEVERE, null, ex);
> } catch (ClassNotFoundException ex) {
> //Logger.getLogger(FileCount.class.getName()).log(Level.SEVERE, null, ex);
> }
>
> }
>
> }
>
>
> Thanks in advance for the great help and support to fix the issue .
>
> Please help to fix it.
>
> Thanks a lot.
>
> Regards,
> Ranjini
>
>
>> Hi,
>>
>> I have folder named INPUT.
>>
>> Inside INPUT i have 5 resume are there.
>>
>> hduser@localhost:~/Ranjini$ hadoop fs -ls /user/hduser/INPUT
>> Found 5 items
>> -rw-r--r--   1 hduser supergroup       5438 2014-03-18 15:20
>> /user/hduser/INPUT/Rakesh Chowdary_Microstrategy.txt
>> -rw-r--r--   1 hduser supergroup       6022 2014-03-18 15:22
>> /user/hduser/INPUT/Ramarao Devineni_Microstrategy.txt
>> -rw-r--r--   1 hduser supergroup       3517 2014-03-18 15:21
>> /user/hduser/INPUT/vinitha.txt
>> -rw-r--r--   1 hduser supergroup       3517 2014-03-18 15:21
>> /user/hduser/INPUT/sony.txt
>> -rw-r--r--   1 hduser supergroup       3517 2014-03-18 15:21
>> /user/hduser/INPUT/ravi.txt
>> hduser@localhost:~/Ranjini$
>>
>> I have to process the folder and the content .
>>
>> I need ouput has
>>
>> filename   word   occurance
>> vinitha       java       4
>> sony          oracle      3
>>
>>
>>
>> But iam not getting the filename.  Has the input file content are merged
>> file name is not getting correct .
>>
>>
>> please help in this issue to fix.  I have given by code below
>>
>>
>>  import java.io.IOException;
>>  import java.util.*;
>>  import org.apache.hadoop.fs.Path;
>>  import org.apache.hadoop.conf.*;
>>  import org.apache.hadoop.io.*;
>>  import org.apache.hadoop.mapred.*;
>>  import org.apache.hadoop.util.*;
>> import java.io.File;
>> import java.io.FileReader;
>> import java.io.FileWriter;
>> import java.io.IOException;
>> import org.apache.hadoop.fs.Path;
>> import org.apache.hadoop.conf.Configuration;
>> import org.apache.hadoop.fs.FileSystem;
>> import org.apache.hadoop.fs.FileStatus;
>> import org.apache.hadoop.conf.*;
>> import org.apache.hadoop.io.*;
>> import org.apache.hadoop.mapred.*;
>> import org.apache.hadoop.util.*;
>> import org.apache.hadoop.mapred.lib.*;
>>
>>  public class WordCount {
>>     public static class Map extends MapReduceBase implements
>> Mapper<LongWritable, Text, Text, IntWritable> {
>>      private final static IntWritable one = new IntWritable(1);
>>       private Text word = new Text();
>>       public void map(LongWritable key, Text value, OutputCollector<Text,
>> IntWritable> output, Reporter reporter) throws IOException {
>>    FSDataInputStream fs=null;
>>    FileSystem hdfs = null;
>>    String line = value.toString();
>>          int i=0,k=0;
>>   try{
>>    Configuration configuration = new Configuration();
>>       configuration.set("fs.default.name", "hdfs://localhost:4440/");
>>
>>    Path srcPath = new Path("/user/hduser/INPUT/");
>>
>>    hdfs = FileSystem.get(configuration);
>>    FileStatus[] status = hdfs.listStatus(srcPath);
>>    fs=hdfs.open(srcPath);
>>    BufferedReader br=new BufferedReader(new
>> InputStreamReader(hdfs.open(srcPath)));
>>
>> String[] splited = line.split("\\s+");
>>     for( i=0;i<splited.length;i++)
>>  {
>>      String sp[]=splited[i].split(",");
>>      for( k=0;k<sp.length;k++)
>>  {
>>
>>    if(!sp[k].isEmpty()){
>> StringTokenizer tokenizer = new StringTokenizer(sp[k]);
>> if((sp[k].equalsIgnoreCase("C"))){
>>         while (tokenizer.hasMoreTokens()) {
>>           word.set(tokenizer.nextToken());
>>           output.collect(word, one);
>>         }
>> }
>> if((sp[k].equalsIgnoreCase("JAVA"))){
>>         while (tokenizer.hasMoreTokens()) {
>>           word.set(tokenizer.nextToken());
>>           output.collect(word, one);
>>         }
>> }
>>       }
>>     }
>> }
>>  } catch (IOException e) {
>>     e.printStackTrace();
>>  }
>> }
>> }
>>     public static class Reduce extends MapReduceBase implements
>> Reducer<Text, IntWritable, Text, IntWritable> {
>>       public void reduce(Text key, Iterator<IntWritable> values,
>> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
>> IOException {
>>         int sum = 0;
>>         while (values.hasNext()) {
>>           sum += values.next().get();
>>         }
>>         output.collect(key, new IntWritable(sum));
>>       }
>>     }
>>     public static void main(String[] args) throws Exception {
>>
>>
>>       JobConf conf = new JobConf(WordCount.class);
>>       conf.setJobName("wordcount");
>>       conf.setOutputKeyClass(Text.class);
>>       conf.setOutputValueClass(IntWritable.class);
>>       conf.setMapperClass(Map.class);
>>       conf.setCombinerClass(Reduce.class);
>>       conf.setReducerClass(Reduce.class);
>>       conf.setInputFormat(TextInputFormat.class);
>>       conf.setOutputFormat(TextOutputFormat.class);
>>       FileInputFormat.setInputPaths(conf, new Path(args[0]));
>>       FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>>       JobClient.runJob(conf);
>>     }
>>  }
>>
>>
>>
>> Please help
>>
>> Thanks in advance.
>>
>> Ranjini
>>
>>
>>
>> ----------
>> From: *Stanley Shi* <sshi@gopivotal.com>
>> Date: Thu, Mar 20, 2014 at 7:39 AM
>> To: user@hadoop.apache.org
>>
>>
>> You want to do a word count for each file, but the code give you a word
>> count for all the files, right?
>>
>> =====
>>  word.set(tokenizer.nextToken());
>>           output.collect(word, one);
>> ======
>> change it to:
>> word.set("filename"+"    "+tokenizer.nextToken());
>> output.collect(word,one);
>>
>>
>>
>>
>>  Regards,
>> *Stanley Shi,*
>>
>>
>> ----------
>> From: *Ranjini Rathinam* <ranjinibecse@gmail.com>
>> Date: Thu, Mar 20, 2014 at 10:56 AM
>> To: ranjini.r@polarisft.com
>>
>>
>>
>> ----------
>> From: *Ranjini Rathinam* <ranjinibecse@gmail.com>
>> Date: Thu, Mar 20, 2014 at 11:20 AM
>> To: user@hadoop.apache.org, sshi@gopivotal.com
>>
>>
>> Hi,
>>
>> If we give the below code,
>> =======================
>>  word.set("filename"+"    "+tokenizer.nextToken());
>> output.collect(word,one);
>> ======================
>>
>> The output is wrong. because it shows the
>>
>>  filename   word   occurance
>> vinitha       java       4
>> vinitha         oracle      3
>> sony           java       4
>> sony          oracle      3
>>
>>
>> Here vinitha does not have oracle word . Similarlly sony does not have
>> java has word. File name is merging for  all words.
>>
>> I need the output has given below
>>
>>  filename   word   occurance
>>
>> vinitha       java       4
>> vinitha         C++    3
>> sony           ETL     4
>> sony          oracle      3
>>
>>
>>  Need fileaName along with the word in that particular file only. No
>> merge should happen.
>>
>> Please help me out for this issue.
>>
>> Please help.
>>
>> Thanks in advance.
>>
>> Ranjini
>>
>> ----------
>> From: *Felix Chern* <idryman@gmail.com>
>> Date: Thu, Mar 20, 2014 at 11:25 PM
>> To: user@hadoop.apache.org
>> Cc: sshi@gopivotal.com
>>
>>
>>  I've written two blog post of how to get directory context in hadoop
>> mapper.
>>
>>
>> http://www.idryman.org/blog/2014/01/26/capture-directory-context-in-hadoop-mapper/
>>
>> http://www.idryman.org/blog/2014/01/27/capture-path-info-in-hadoop-inputformat-class/
>>
>> Cheers,
>> Felix
>>
>> ----------
>> From: *Stanley Shi* <sshi@gopivotal.com>
>> Date: Fri, Mar 21, 2014 at 7:02 AM
>>
>> To: Ranjini Rathinam <ranjinibecse@gmail.com>
>> Cc: user@hadoop.apache.org
>>
>>
>> Just reviewed the code again, you are not really using map-reduce. you
>> are reading all files in one map process, this is not a normal map-reduce
>> job works.
>>
>>
>>  Regards,
>> *Stanley Shi,*
>>
>>
>> ----------
>> From: *Stanley Shi* <sshi@gopivotal.com>
>> Date: Fri, Mar 21, 2014 at 7:43 AM
>> To: Ranjini Rathinam <ranjinibecse@gmail.com>
>> Cc: user@hadoop.apache.org
>>
>>
>> Change you mapper to be something like this:
>>
>>  public static class TokenizerMapper extends
>>
>>       Mapper<Object, Text, Text, IntWritable> {
>>
>>
>>     private final static IntWritable one = new IntWritable(1);
>>
>>     private Text word = new Text();
>>
>>
>>     public void map(Object key, Text value, Context context)
>>
>>         throws IOException, InterruptedException {
>>
>>       Path pp = ((FileSplit) context.getInputSplit()).getPath();
>>
>>       StringTokenizer itr = new StringTokenizer(value.toString());
>>
>>       log.info("map on string: " + new String(value.getBytes()));
>>
>>       while (itr.hasMoreTokens()) {
>>
>>         word.set(pp.getName() + " " + itr.nextToken());
>>
>>         context.write(word, one);
>>
>>       }
>>
>>     }
>>
>>   }
>>
>> Note: add your filtering code here;
>>
>> and then when running the command, use you input path as param;
>>
>>  Regards,
>> *Stanley Shi,*
>>
>>
>> ----------
>> From: *Ranjini Rathinam* <ranjinibecse@gmail.com>
>> Date: Fri, Mar 21, 2014 at 9:57 AM
>> To: ranjini.r@polarisft.com
>>
>>
>>
>>
>>  ---------- Forwarded message ----------
>> From: Stanley Shi <sshi@gopivotal.com>
>> Date: Fri, Mar 21, 2014 at 7:43 AM
>> Subject: Re: Need FileName with Content
>>
>>
>> ----------
>> From: *Ranjini Rathinam* <ranjinibecse@gmail.com>
>> Date: Fri, Mar 21, 2014 at 9:58 AM
>> To: ranjini.r@polarisft.com
>>
>>
>>
>>
>>
>

--001a113494749e730404f51be92c
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div>Hi,</div>
<div>=A0</div>
<div>Thanks for the great support i have fixed the issue. I have now got th=
e=A0output.</div>
<div>=A0</div>
<div>But , i have one query ,Possible to give runtime argument=A0for mapper=
 class</div>
<div>=A0</div>
<div>like,</div>
<div>=A0</div>
<div>Giving the value C,JAVA in runtime. </div>
<div>=A0</div>
<div>=A0</div>
<div><strong>=A0if((sp[k].equalsIgnoreCase(&quot;C&quot;))){<br></strong>
<div>=A0=A0=A0=A0=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=
=A0 =A0=A0=A0 =A0=A0=A0 while (itr.hasMoreTokens()) {<br>=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =
=A0=A0=A0 =A0=A0=A0 word.set(pp.getName() + &quot; &quot; + itr.nextToken()=
);<br><br>=A0=A0=A0=A0=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=
=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 context.write(word, one);<br>
=A0=A0=A0=A0=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=
=A0=A0 =A0=A0=A0 =A0=A0=A0 }<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=
=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 }<br></div>
<div>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =
<strong>=A0=A0=A0 if((sp[k].equalsIgnoreCase(&quot;JAVA&quot;))){<br></stro=
ng></div>
<div>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0while (itr.hasMoreTokens()) {<br>=A0=A0=A0=
 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=
=A0=A0 =A0=A0=A0 =A0=A0 word.set(pp.getName() + &quot; &quot; + itr.nextTok=
en());<br><br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 context.write(word, one);<br>
</div></div>
<div>=A0</div>
<div>=A0Thanks a lot . </div>
<div>=A0</div>
<div>Ranjini</div>
<div><br><br>=A0</div>
<div class=3D"gmail_quote">On Fri, Mar 21, 2014 at 11:45 AM, Ranjini Rathin=
am <span dir=3D"ltr">&lt;<a href=3D"mailto:ranjinibecse@gmail.com" target=
=3D"_blank">ranjinibecse@gmail.com</a>&gt;</span> wrote:<br>
<blockquote style=3D"BORDER-LEFT:#ccc 1px solid;MARGIN:0px 0px 0px 0.8ex;PA=
DDING-LEFT:1ex" class=3D"gmail_quote">
<div>Hi,</div>
<div>=A0</div>
<div>=A0</div>
<div>Thanks a lot for the great support. I am just learning hadoop and mapr=
educe. </div>
<div>=A0</div>
<div>I have used the way you have guided me.</div>
<div>=A0</div>
<div>But the output is coming without Aggreating </div>
<div>=A0</div>
<div><span>vinitha.txt C=A0=A0=A0 1<br>vinitha.txt Java=A0=A0=A0 1<br>vinit=
ha.txt Java=A0=A0=A0 1<br>vinitha.txt Java=A0=A0=A0 1<br>vinitha.txt Java=
=A0=A0=A0 1</span></div>
<div>
<p><span></span>=A0</p>
<div><span><strong>I need the output has </strong></span></div>
<div><span><strong></strong></span>=A0</div></div>
<div><span>
<div><span><strong>vinitha=A0=A0=A0=A0=A0 =A0C=A0=A0=A0 1</strong></span></=
div>
<div><span><br><strong>vinitha=A0=A0=A0=A0=A0 Java=A0=A04</strong></span></=
div></span></div>
<div><br><br>I have reduce class but still not able to fix it, I am still t=
rying .</div>
<div>=A0</div>
<div>I have given my code below, Please let me know where i have gone wrong=
.</div>
<div>=A0</div>
<div>=A0</div>
<div>my code</div>
<div>=A0</div>
<div>=A0</div>
<div>import org.apache.hadoop.conf.Configuration;<br>import org.apache.hado=
op.fs.Path;<br>import org.apache.hadoop.io.LongWritable;<br>import org.apac=
he.hadoop.io.*;<br>import org.apache.hadoop.io.Text;<br>import org.apache.h=
adoop.mapreduce.InputSplit;<br>
import org.apache.hadoop.mapreduce.lib.input.FileSplit;<br>import org.apach=
e.hadoop.mapreduce.*;<br>import org.apache.hadoop.mapreduce.Job;<br>import =
org.apache.hadoop.mapreduce.Mapper;<br>import org.apache.hadoop.mapreduce.R=
educer;<br>
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;<br>import org=
.apache.hadoop.mapreduce.lib.input.FileInputFormat;<br>import org.apache.ha=
doop.mapreduce.lib.output.FileOutputFormat;<br>import org.apache.hadoop.map=
reduce.lib.output.TextOutputFormat;=20
<div><br>import java.io.IOException;<br>import org.apache.hadoop.fs.Path;<b=
r>import org.apache.hadoop.conf.Configuration;<br>import org.apache.hadoop.=
fs.FileSystem;<br>import org.apache.hadoop.fs.FileStatus;<br></div>import j=
ava.util.*;<br>
import java.util.logging.Level;<br>import java.util.logging.Logger;<br><br>=
=A0public class FileCount {<br>=A0=A0=A0 public static class TokenizerMappe=
r extends Mapper&lt;LongWritable, Text, Text, IntWritable&gt; {=20
<div><br><br>=A0=A0=A0 private final static IntWritable one =3D new IntWrit=
able(1);<br><br>=A0=A0=A0 private Text word =3D new Text();<br><br><br></di=
v>=A0=A0=A0 public void map(LongWritable key, Text value, Context context) =
throws IOException, InterruptedException {<br>
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 <br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 F=
ileSplit fileSplit;<br>=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 InputSplit is =3D =
context.getInputSplit();<br>=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 FileSystem fs=
 =3D FileSystem.get(context.getConfiguration());<br>=A0 =A0=A0=A0 =A0=A0=A0=
 =A0=A0=A0 fileSplit =3D (FileSplit) is;<br>
=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 Path pp =3D fileSplit.getPath();<br>=A0=
=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 String line=3Dvalue.toString=
();<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 int i=3D0;int k=3D=
0;<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 //Path pp =3D ((Fil=
eSplit) context.getInputSplit()).getPath();=A0=A0=A0 <br>

<div>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 <br>=A0=A0=A0 =A0=A0=
=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 String[] splited =3D line.split(&quot;\\s=
+&quot;);<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 fo=
r( i=3D0;i&lt;splited.length;i++)<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=
=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 {<br>=A0=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=
=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 String sp[]=3Dsplited[i].spl=
it(&quot;,&quot;);<br>
=A0=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 for( k=3D0;k=
&lt;sp.length;k++)<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=
=A0=A0 =A0=A0=A0 {<br>=A0=A0 =A0=A0=A0 =A0=A0=A0 <br>=A0=A0 =A0=A0=A0 =A0=
=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 if(!sp[k].isEmpty(=
))<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0=
 {=A0=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =
=A0=A0=A0 <br>
<br></div>=A0=A0=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0=
 =A0=A0=A0 =A0=A0=A0 StringTokenizer itr =3D new StringTokenizer(sp[k]);<br=
><br>=A0=A0=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=
=A0=A0 =A0=A0=A0 //<a href=3D"http://log.info/" target=3D"_blank">log.info<=
/a>(&quot;map on string: &quot; + new String(value.getBytes()));=20
<div><br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=
=A0 =A0=A0=A0 if((sp[k].equalsIgnoreCase(&quot;C&quot;))){<br></div>
<div>=A0=A0=A0=A0=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=
=A0 =A0=A0=A0 =A0=A0=A0 while (itr.hasMoreTokens()) {<br>=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =
=A0=A0=A0 =A0=A0=A0 word.set(pp.getName() + &quot; &quot; + itr.nextToken()=
);<br><br>=A0=A0=A0=A0=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=
=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 context.write(word, one);<br>
=A0=A0=A0=A0=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=
=A0=A0 =A0=A0=A0 =A0=A0=A0 }<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=
=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 }<br></div>
<div>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =
=A0=A0=A0 if((sp[k].equalsIgnoreCase(&quot;JAVA&quot;))){<br></div>
<div>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0while (itr.hasMoreTokens()) {<br>=A0=A0=A0=
 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=
=A0=A0 =A0=A0=A0 =A0=A0 word.set(pp.getName() + &quot; &quot; + itr.nextTok=
en());<br><br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 context.write(word, one);<br>
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=
=A0=A0 =A0=A0=A0 =A0=A0=A0 }<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=
=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 }<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=
=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0}<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 }<br>=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 }=A0=A0=A0=A0=A0 <br><br>=A0=A0=A0=A0=A0=A0=
=A0=A0=A0 }<br><br>=A0 }<br><br></div>
=A0 public static class Reduce extends Reducer&lt;Text, IntWritable, Text, =
IntWritable&gt; {<br><br>=A0=A0=A0 public void reduce(Text key, Iterator&lt=
;IntWritable&gt; values, Context context) throws IOException, InterruptedEx=
ception {=20
<div><br>=A0=A0=A0 <br>=A0=A0=A0=A0=A0=A0=A0 int sum =3D 0;<br>=A0=A0=A0=A0=
=A0=A0=A0 while (values.hasNext()) {<br>=A0=A0=A0=A0=A0=A0=A0=A0=A0 sum +=
=3D values.next().get();<br>=A0=A0=A0=A0=A0=A0=A0 }<br></div>=A0=A0=A0=A0=
=A0=A0 context.write(key, new IntWritable(sum));=20
<div><br>=A0=A0=A0=A0=A0 }<br>=A0=A0=A0 }<br>=A0=A0=A0 public static void m=
ain(String[] args) throws Exception {<br></div>=A0=A0=A0 =A0=A0=A0 =A0=A0=
=A0 Configuration conf =3D new Configuration();<br>Job job =3D new Job(conf=
, &quot;jobName&quot;);<br><br>String input=3D&quot;/user/hduser/INPUT/&quo=
t;;<br>
String output=3D&quot;/user/hduser/OUTPUT/&quot;;<br>FileInputFormat.setInp=
utPaths(job, input);<br>job.setJarByClass(FileCount.class);<br>job.setMappe=
rClass(TokenizerMapper.class);<br>job.setReducerClass(Reduce.class);<br>job=
.setCombinerClass(Reduce.class);<br>
job.setInputFormatClass(TextInputFormat.class);<br>job.setOutputKeyClass(Te=
xt.class);<br>job.setOutputValueClass(IntWritable.class);<br>Path outPath =
=3D new Path(output);<br>FileOutputFormat.setOutputPath(job, outPath);<br>
FileSystem dfs =3D FileSystem.get(outPath.toUri(), conf);<br>if (dfs.exists=
(outPath)) {<br>dfs.delete(outPath, true);<br>}<br><br><br>try {<br><br>job=
.waitForCompletion(true);<br><br>} catch (InterruptedException ex) {<br>//L=
ogger.getLogger(FileCOunt.class.getName()).log(Level.SEVERE, null, ex);<br>
} catch (ClassNotFoundException ex) {<br>//Logger.getLogger(FileCount.class=
.getName()).log(Level.SEVERE, null, ex);<br>}<br><br>}<br><br>}<br></div>
<div>=A0</div>
<div>=A0</div>
<div>Thanks in advance for=A0the great help and support to fix the issue .<=
/div>
<div>=A0</div>
<div>Please help to fix it.</div>
<div>=A0</div>
<div>Thanks a lot.</div>
<div>=A0</div>
<div>Regards,</div>
<div>Ranjini<br><br></div>
<div class=3D"HOEnZb">
<div class=3D"h5">
<div class=3D"gmail_quote">
<blockquote style=3D"BORDER-LEFT:#ccc 1px solid;MARGIN:0px 0px 0px 0.8ex;PA=
DDING-LEFT:1ex" class=3D"gmail_quote">
<div class=3D"gmail_quote">
<div>
<div><br>
<div><span>Hi,<br><br>I have folder named INPUT.<br><br>Inside INPUT i have=
 5 resume are there.<br><br>hduser@localhost:~/Ranjini$ hadoop fs -ls /user=
/hduser/INPUT<br>Found 5 items<br>-rw-r--r--=A0=A0 1 hduser supergroup=A0=
=A0=A0=A0=A0=A0 5438 2014-03-18 15:20 /user/hduser/INPUT/Rakesh Chowdary_Mi=
crostrategy.txt<br>
-rw-r--r--=A0=A0 1 hduser supergroup=A0=A0=A0=A0=A0=A0 6022 2014-03-18 15:2=
2 /user/hduser/INPUT/Ramarao Devineni_Microstrategy.txt<br>-rw-r--r--=A0=A0=
 1 hduser supergroup=A0=A0=A0=A0=A0=A0 3517 2014-03-18 15:21 /user/hduser/I=
NPUT/vinitha.txt<br>-rw-r--r--=A0=A0 1 hduser supergroup=A0=A0=A0=A0=A0=A0 =
3517 2014-03-18 15:21 /user/hduser/INPUT/sony.txt<br>
-rw-r--r--=A0=A0 1 hduser supergroup=A0=A0=A0=A0=A0=A0 3517 2014-03-18 15:2=
1 /user/hduser/INPUT/ravi.txt<br>hduser@localhost:~/Ranjini$ <br><br>I have=
 to process the folder and the content .<br><br><span style=3D"FONT-WEIGHT:=
bold">I need ouput has </span><br>
<br>filename=A0=A0 word=A0=A0 occurance<br>vinitha=A0=A0=A0=A0=A0=A0 java=
=A0=A0=A0=A0=A0=A0 4<br>sony=A0=A0=A0=A0=A0=A0=A0=A0=A0 oracle=A0=A0=A0=A0=
=A0 3<br><br><br><br>But iam not getting the filename.=A0 Has the input fil=
e content are merged file name is not getting correct .<br><br>
<br>please help in this issue to fix.=A0 I have given by code below </span>=
</div>
<div><span></span>=A0</div>
<div><span></span>=A0</div>
<div><span>=A0import java.io.IOException;<br>=A0import java.util.*;<br>=A0i=
mport org.apache.hadoop.fs.Path;<br>=A0import org.apache.hadoop.conf.*;<br>=
=A0import org.apache.hadoop.io.*;<br>=A0import org.apache.hadoop.mapred.*;<=
br>=A0import org.apache.hadoop.util.*;<br>
import java.io.File;<br>import java.io.FileReader;<br>import java.io.FileWr=
iter;<br>import java.io.IOException;</span></div>
<div><span>import org.apache.hadoop.fs.Path;<br>import org.apache.hadoop.co=
nf.Configuration;<br>import org.apache.hadoop.fs.FileSystem;<br>import org.=
apache.hadoop.fs.FileStatus;<br>import org.apache.hadoop.conf.*;<br>import =
org.apache.hadoop.io.*;<br>
import org.apache.hadoop.mapred.*;<br>import org.apache.hadoop.util.*;<br>i=
mport org.apache.hadoop.mapred.lib.*;</span></div><span>
<div><br>=A0public class WordCount {<br>=A0=A0=A0 public static class Map e=
xtends MapReduceBase implements Mapper&lt;LongWritable, Text, Text, IntWrit=
able&gt; {<br>=A0=A0=A0=A0 private final static IntWritable one =3D new Int=
Writable(1);</div>

<div>=A0=A0=A0=A0=A0 private Text word =3D new Text();<br>=A0=A0=A0=A0=A0 p=
ublic void map(LongWritable key, Text value, OutputCollector&lt;Text, IntWr=
itable&gt; output, Reporter reporter) throws IOException {<br>=A0=A0=A0FSDa=
taInputStream fs=3Dnull;<br>
=A0=A0=A0FileSystem hdfs =3D null;<br>=A0=A0=A0String line =3D value.toStri=
ng();<br>=A0=A0=A0=A0=A0=A0=A0 =A0int i=3D0,k=3D0;</div>
<div>=A0=A0try{<br>=A0=A0=A0Configuration configuration =3D new Configurati=
on();<br>=A0=A0=A0=A0=A0 configuration.set(&quot;<a href=3D"http://fs.defau=
lt.name/" target=3D"_blank">fs.default.name</a>&quot;, &quot;hdfs://localho=
st:4440/&quot;);<br>=A0=A0=A0<br>
=A0=A0=A0Path srcPath =3D new Path(&quot;/user/hduser/INPUT/&quot;);<br>=A0=
=A0=A0<br>=A0=A0=A0hdfs =3D FileSystem.get(configuration);<br>=A0=A0=A0File=
Status[] status =3D hdfs.listStatus(srcPath);<br>=A0=A0=A0fs=3Dhdfs.open(sr=
cPath);<br>=A0=A0=A0BufferedReader br=3Dnew BufferedReader(new InputStreamR=
eader(hdfs.open(srcPath)));<br>
=A0=A0=A0<br>String[] splited =3D line.split(&quot;<a>\\s</a>+&quot;);<br>=
=A0=A0=A0 for( i=3D0;i&lt;splited.length;i++)<br>=A0{<br>=A0=A0=A0=A0 Strin=
g sp[]=3Dsplited[i].split(&quot;,&quot;);<br>=A0=A0=A0=A0 for( k=3D0;k&lt;s=
p.length;k++)<br>=A0{<br>=A0=A0 =A0=A0<br>=A0=A0 if(!sp[k].isEmpty()){<br>
StringTokenizer tokenizer =3D new StringTokenizer(sp[k]);</div>
<div>if((sp[k].equalsIgnoreCase(&quot;C&quot;))){<br>=A0=A0=A0=A0=A0=A0=A0 =
while (tokenizer.hasMoreTokens()) {<br>=A0=A0=A0=A0=A0=A0=A0=A0=A0 word.set=
(tokenizer.nextToken());</div>
<div>=A0=A0=A0=A0=A0=A0=A0=A0=A0 output.collect(word, one);<br>=A0=A0=A0=A0=
=A0=A0=A0 }<br>}<br>if((sp[k].equalsIgnoreCase(&quot;JAVA&quot;))){<br>=A0=
=A0=A0=A0=A0=A0=A0 while (tokenizer.hasMoreTokens()) {<br>=A0=A0=A0=A0=A0=
=A0=A0=A0=A0 word.set(tokenizer.nextToken());</div>
<div>=A0=A0=A0=A0=A0=A0=A0=A0=A0 output.collect(word, one);<br>=A0=A0=A0=A0=
=A0=A0=A0 }<br>}<br>=A0=A0=A0=A0=A0 }<br>=A0=A0=A0 }<br>}</div>
<div>=A0} catch (IOException e) {<br>=A0=A0=A0=A0e.printStackTrace();<br>=
=A0}=A0<br>}<br>}<br>=A0=A0=A0 public static class Reduce extends MapReduce=
Base implements Reducer&lt;Text, IntWritable, Text, IntWritable&gt; {<br>=
=A0=A0=A0=A0=A0 public void reduce(Text key, Iterator&lt;IntWritable&gt; va=
lues, OutputCollector&lt;Text, IntWritable&gt; output, Reporter reporter) t=
hrows IOException {<br>
=A0=A0=A0=A0=A0=A0=A0 int sum =3D 0;<br>=A0=A0=A0=A0=A0=A0=A0 while (values=
.hasNext()) {<br>=A0=A0=A0=A0=A0=A0=A0=A0=A0 sum +=3D values.next().get();<=
br>=A0=A0=A0=A0=A0=A0=A0 }<br>=A0=A0=A0=A0=A0=A0=A0 output.collect(key, new=
 IntWritable(sum));<br>=A0=A0=A0=A0=A0 }<br>=A0=A0=A0 }<br>=A0=A0=A0 public=
 static void main(String[] args) throws Exception {<br>
=A0<br>=A0<br>=A0=A0=A0=A0=A0 JobConf conf =3D new JobConf(WordCount.class)=
;<br>=A0=A0=A0=A0=A0 conf.setJobName(&quot;wordcount&quot;);<br>=A0=A0=A0=
=A0=A0 conf.setOutputKeyClass(Text.class);<br>=A0=A0=A0=A0=A0 conf.setOutpu=
tValueClass(IntWritable.class);<br>=A0=A0=A0=A0=A0 conf.setMapperClass(Map.=
class);<br>
=A0=A0=A0=A0=A0 conf.setCombinerClass(Reduce.class);<br>=A0=A0=A0=A0=A0 con=
f.setReducerClass(Reduce.class);<br>=A0=A0=A0=A0=A0 conf.setInputFormat(Tex=
tInputFormat.class);<br>=A0=A0=A0=A0=A0 conf.setOutputFormat(TextOutputForm=
at.class);<br>=A0=A0=A0=A0=A0 FileInputFormat.setInputPaths(conf, new Path(=
args[0]));<br>
=A0=A0=A0=A0=A0 FileOutputFormat.setOutputPath(conf, new Path(args[1]));<br=
>=A0=A0=A0=A0=A0 JobClient.runJob(conf);<br>=A0=A0=A0 }<br>=A0}</div>
<div></div></span>=A0=20
<p><span></span>=A0</p>
<div><span>Please help</span></div>
<div><span></span>=A0</div>
<div><span>Thanks in advance.</span></div><span><font color=3D"#888888">
<div><span></span>=A0</div>
<div><span>Ranjini</span></div>
<div><br><br></div></font></span><br></div></div>----------<br><span><font =
color=3D"#888">
<div>From: <b>Stanley Shi</b> <span dir=3D"ltr">&lt;<a href=3D"mailto:sshi@=
gopivotal.com" target=3D"_blank">sshi@gopivotal.com</a>&gt;</span><br>Date:=
 Thu, Mar 20, 2014 at 7:39 AM<br></div>To: <a href=3D"mailto:user@hadoop.ap=
ache.org" target=3D"_blank">user@hadoop.apache.org</a><br>
</font><br></span>
<div><br>
<div dir=3D"ltr">You want to do a word count for each file, but the code gi=
ve you a word count for all the files, right?=20
<div><br></div>
<div>=3D=3D=3D=3D=3D</div>
<div>
<div style=3D"FONT-FAMILY:arial,sans-serif;FONT-SIZE:12px">word.set(tokeniz=
er.nextToken());</div>
<div style=3D"FONT-FAMILY:arial,sans-serif;FONT-SIZE:12px">=A0=A0=A0=A0=A0=
=A0=A0=A0=A0 output.collect(word, one);</div>
<div style=3D"FONT-FAMILY:arial,sans-serif;FONT-SIZE:12px">=3D=3D=3D=3D=3D=
=3D</div></div>
<div style=3D"FONT-FAMILY:arial,sans-serif;FONT-SIZE:12px">change it to:</d=
iv>
<div style=3D"FONT-FAMILY:arial,sans-serif;FONT-SIZE:12px">word.set(&quot;f=
ilename&quot;+&quot; =A0 =A0&quot;+tokenizer.nextToken());</div>
<div style=3D"FONT-FAMILY:arial,sans-serif;FONT-SIZE:12px">output.collect(w=
ord,one);</div>
<div style=3D"FONT-FAMILY:arial,sans-serif;FONT-SIZE:12px"><br></div>
<div style=3D"FONT-FAMILY:arial,sans-serif;FONT-SIZE:12px"><br></div>
<div style=3D"FONT-FAMILY:arial,sans-serif;FONT-SIZE:12px"><br></div></div>
<div class=3D"gmail_extra"><br clear=3D"all">
<div>
<div dir=3D"ltr">
<div>Regards,</div>
<div><b>Stanley Shi,</b></div><img src=3D"http://www.gopivotal.com/files/me=
dia/logos/pivotal-logo-email-signature.png"><br></div></div>
<div></div></div><br></div>----------<br><span><font color=3D"#888">From: <=
b>Ranjini Rathinam</b> <span dir=3D"ltr">&lt;<a href=3D"mailto:ranjinibecse=
@gmail.com" target=3D"_blank">ranjinibecse@gmail.com</a>&gt;</span><br>Date=
: Thu, Mar 20, 2014 at 10:56 AM<br>
To: <a href=3D"mailto:ranjini.r@polarisft.com" target=3D"_blank">ranjini.r@=
polarisft.com</a><br></font><br></span><br>
<div></div><br>----------<br><span><font color=3D"#888">From: <b>Ranjini Ra=
thinam</b> <span dir=3D"ltr">&lt;<a href=3D"mailto:ranjinibecse@gmail.com" =
target=3D"_blank">ranjinibecse@gmail.com</a>&gt;</span><br>Date: Thu, Mar 2=
0, 2014 at 11:20 AM<br>
To: <a href=3D"mailto:user@hadoop.apache.org" target=3D"_blank">user@hadoop=
.apache.org</a>, <a href=3D"mailto:sshi@gopivotal.com" target=3D"_blank">ss=
hi@gopivotal.com</a><br></font><br></span>
<div><br>
<div>Hi,</div>
<div>=A0</div>
<div>If we give the below code,</div>
<div style=3D"FONT-FAMILY:arial,sans-serif;FONT-SIZE:12px">=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D</div>
<div>
<div style=3D"FONT-FAMILY:arial,sans-serif;FONT-SIZE:12px">word.set(&quot;f=
ilename&quot;+&quot; =A0 =A0&quot;+tokenizer.nextToken());</div>
<div style=3D"FONT-FAMILY:arial,sans-serif;FONT-SIZE:12px">output.collect(w=
ord,one);</div></div>
<div>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<br>=
</div>
<div>=A0</div>
<div>The output is wrong. because it shows the </div>
<div>=A0</div>
<div>
<div>filename=A0=A0 word=A0=A0 occurance<br>vinitha=A0=A0=A0=A0=A0=A0 java=
=A0=A0=A0=A0=A0=A0 4<br></div>vinitha=A0=A0=A0=A0=A0=A0=A0=A0 oracle=A0=A0=
=A0=A0=A0 3</div>
<div>sony=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 java=A0=A0=A0=A0=A0=A0 4<br>sony=A0=
=A0=A0=A0=A0=A0=A0=A0=A0 oracle=A0=A0=A0=A0=A0 3</div>
<div>=A0</div>
<div>=A0</div>
<div>Here vinitha does not have oracle word . Similarlly sony does not have=
 java has word. File name is merging for=A0 all words. </div>
<div>=A0</div>
<div>I need the output has given below</div>
<div>=A0</div>
<div>
<div>filename=A0=A0 word=A0=A0 occurance</div>
<div><br>vinitha=A0=A0=A0=A0=A0=A0 java=A0=A0=A0=A0=A0=A0 4<br>vinitha=A0=
=A0=A0=A0=A0=A0=A0=A0 C++=A0=A0=A0 3</div>
<div>sony=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 ETL=A0=A0=A0=A0 4<br>sony=A0=A0=A0=
=A0=A0=A0=A0=A0=A0 oracle=A0=A0=A0=A0=A0 3</div>
<div>=A0</div>
<div>=A0</div>
<div>=A0Need fileaName along with the word in that particular file only. No=
 merge should happen.</div>
<div>=A0</div>
<div>Please help me out for this issue.</div>
<div>=A0</div>
<div>Please help.</div>
<div>=A0</div>
<div>Thanks in advance.</div><span><font color=3D"#888888">
<div>=A0</div>
<div>Ranjini</div></font></span></div>
<div></div><br></div>----------<br><span><font color=3D"#888">From: <b>Feli=
x Chern</b> <span dir=3D"ltr">&lt;<a href=3D"mailto:idryman@gmail.com" targ=
et=3D"_blank">idryman@gmail.com</a>&gt;</span><br>Date: Thu, Mar 20, 2014 a=
t 11:25 PM<br>
To: <a href=3D"mailto:user@hadoop.apache.org" target=3D"_blank">user@hadoop=
.apache.org</a><br>Cc: <a href=3D"mailto:sshi@gopivotal.com" target=3D"_bla=
nk">sshi@gopivotal.com</a><br></font><br></span>
<div><br>
<div style=3D"WORD-WRAP:break-word">
<div>I&#39;ve written two blog post of how to get directory context in hado=
op mapper.</div>
<div><br></div>
<div><a href=3D"http://www.idryman.org/blog/2014/01/26/capture-directory-co=
ntext-in-hadoop-mapper/" target=3D"_blank">http://www.idryman.org/blog/2014=
/01/26/capture-directory-context-in-hadoop-mapper/</a></div><a href=3D"http=
://www.idryman.org/blog/2014/01/27/capture-path-info-in-hadoop-inputformat-=
class/" target=3D"_blank">http://www.idryman.org/blog/2014/01/27/capture-pa=
th-info-in-hadoop-inputformat-class/</a>=20
<div><br></div>
<div>Cheers,</div>
<div>Felix</div>
<div></div></div><br></div>----------<br><span><font color=3D"#888">From: <=
b>Stanley Shi</b> <span dir=3D"ltr">&lt;<a href=3D"mailto:sshi@gopivotal.co=
m" target=3D"_blank">sshi@gopivotal.com</a>&gt;</span><br>Date: Fri, Mar 21=
, 2014 at 7:02 AM=20
<div><br>To: Ranjini Rathinam &lt;<a href=3D"mailto:ranjinibecse@gmail.com"=
 target=3D"_blank">ranjinibecse@gmail.com</a>&gt;<br>Cc: <a href=3D"mailto:=
user@hadoop.apache.org" target=3D"_blank">user@hadoop.apache.org</a><br></d=
iv>
</font><br></span>
<div><br>
<div dir=3D"ltr">Just reviewed the code again, you are not really using map=
-reduce. you are reading all files in one map process, this is not a normal=
 map-reduce job works.=20
<div><br></div></div>
<div class=3D"gmail_extra"><br clear=3D"all">
<div>
<div dir=3D"ltr">
<div>Regards,</div>
<div><b>Stanley Shi,</b></div><img src=3D"http://www.gopivotal.com/files/me=
dia/logos/pivotal-logo-email-signature.png"><br></div></div>
<div></div></div><br></div>----------<br><span><font color=3D"#888">
<div>From: <b>Stanley Shi</b> <span dir=3D"ltr">&lt;<a href=3D"mailto:sshi@=
gopivotal.com" target=3D"_blank">sshi@gopivotal.com</a>&gt;</span><br>Date:=
 Fri, Mar 21, 2014 at 7:43 AM<br></div>
<div>To: Ranjini Rathinam &lt;<a href=3D"mailto:ranjinibecse@gmail.com" tar=
get=3D"_blank">ranjinibecse@gmail.com</a>&gt;<br>Cc: <a href=3D"mailto:user=
@hadoop.apache.org" target=3D"_blank">user@hadoop.apache.org</a><br></div><=
/font><br>
</span>
<div><br>
<div dir=3D"ltr">Change you mapper to be something like this:=20
<div><br></div>
<div>
<p>public<span> </span>static<span> </span>class<span> TokenizerMapper </sp=
an>extends</p>
<p>=A0 =A0 =A0 Mapper&lt;Object, Text, Text, IntWritable&gt; {</p>
<div>
<p><br></p>
<p>=A0 =A0 <span>private</span> <span>final</span> <span>static</span> IntW=
ritable <span>one</span> =3D <span>new</span> IntWritable(1);</p>
<p>=A0 =A0 <span>private</span> Text <span>word</span> =3D <span>new</span>=
 Text();</p>
<p><br></p></div>
<p>=A0 =A0 <span>public</span> <span>void</span> map(Object key, Text value=
, Context context)</p>
<p>=A0 =A0 =A0 =A0 <span>throws</span> IOException, InterruptedException {<=
/p>
<p>=A0 =A0 =A0 Path pp =3D ((FileSplit) context.getInputSplit()).getPath();=
</p>
<p>=A0 =A0 =A0 StringTokenizer itr =3D <span>new</span> StringTokenizer(val=
ue.toString());</p>
<p>=A0 =A0 =A0 <span>log</span>.info(<span>&quot;map on string: &quot;</spa=
n> + <span>new</span> String(value.getBytes()));</p>
<p>=A0 =A0 =A0 <span>while</span> (itr.hasMoreTokens()) {</p>
<p>=A0 =A0 =A0 =A0 <span>word</span>.set(pp.getName() + <span>&quot; &quot;=
</span> + itr.nextToken());</p>
<p>=A0 =A0 =A0 =A0 context.write(<span>word</span>, <span>one</span>);</p>
<p>=A0 =A0 =A0 }</p>
<p>=A0 =A0 }</p>
<p>=A0 }</p>
<p>Note: add your filtering code here;</p>
<p>and then when running the command, use you input path as param;</p></div=
></div>
<div class=3D"gmail_extra"><br clear=3D"all">
<div>
<div dir=3D"ltr">
<div>Regards,</div>
<div><b>Stanley Shi,</b></div><img src=3D"http://www.gopivotal.com/files/me=
dia/logos/pivotal-logo-email-signature.png"><br></div></div>
<div></div></div><br></div>----------<br><span><font color=3D"#888">
<div>From: <b>Ranjini Rathinam</b> <span dir=3D"ltr">&lt;<a href=3D"mailto:=
ranjinibecse@gmail.com" target=3D"_blank">ranjinibecse@gmail.com</a>&gt;</s=
pan><br>Date: Fri, Mar 21, 2014 at 9:57 AM<br></div>To: <a href=3D"mailto:r=
anjini.r@polarisft.com" target=3D"_blank">ranjini.r@polarisft.com</a><br>
</font><br></span>
<div><br><br><br>
<div class=3D"gmail_quote">
<div>---------- Forwarded message ----------<br>From: <b class=3D"gmail_sen=
dername">Stanley Shi</b> <span dir=3D"ltr">&lt;<a href=3D"mailto:sshi@gopiv=
otal.com" target=3D"_blank">sshi@gopivotal.com</a>&gt;</span><br></div>
<div>Date: Fri, Mar 21, 2014 at 7:43 AM<br>Subject: Re: Need FileName with =
Content<br></div>
<div></div></div><br><br></div>----------<br><span><font color=3D"#888">Fro=
m: <b>Ranjini Rathinam</b> <span dir=3D"ltr">&lt;<a href=3D"mailto:ranjinib=
ecse@gmail.com" target=3D"_blank">ranjinibecse@gmail.com</a>&gt;</span><br>=
Date: Fri, Mar 21, 2014 at 9:58 AM<br>
To: <a href=3D"mailto:ranjini.r@polarisft.com" target=3D"_blank">ranjini.r@=
polarisft.com</a><br></font><br></span><br>
<div></div><br></div><br></blockquote></div><br></div></div></blockquote></=
div><br>

--001a113494749e730404f51be92c--