hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stanley Shi <s...@gopivotal.com>
Subject Re: Need FileName with Content
Date Fri, 21 Mar 2014 01:32:46 GMT
Just reviewed the code again, you are not really using map-reduce. you are
reading all files in one map process, this is not a normal map-reduce job
works.


Regards,
*Stanley Shi,*



On Thu, Mar 20, 2014 at 1:50 PM, Ranjini Rathinam <ranjinibecse@gmail.com>wrote:

> Hi,
>
> If we give the below code,
> =======================
> word.set("filename"+"    "+tokenizer.nextToken());
> output.collect(word,one);
> ======================
>
> The output is wrong. because it shows the
>
> filename   word   occurance
> vinitha       java       4
> vinitha         oracle      3
> sony           java       4
> sony          oracle      3
>
>
> Here vinitha does not have oracle word . Similarlly sony does not have
> java has word. File name is merging for  all words.
>
> I need the output has given below
>
>  filename   word   occurance
>
> vinitha       java       4
> vinitha         C++    3
> sony           ETL     4
> sony          oracle      3
>
>
>  Need fileaName along with the word in that particular file only. No merge
> should happen.
>
> Please help me out for this issue.
>
> Please help.
>
> Thanks in advance.
>
> Ranjini
>
>
>
>
> On Thu, Mar 20, 2014 at 10:56 AM, Ranjini Rathinam <ranjinibecse@gmail.com
> > wrote:
>
>
>>
>> ---------- Forwarded message ----------
>> From: Stanley Shi <sshi@gopivotal.com>
>> Date: Thu, Mar 20, 2014 at 7:39 AM
>> Subject: Re: Need FileName with Content
>> To: user@hadoop.apache.org
>>
>>
>> You want to do a word count for each file, but the code give you a word
>> count for all the files, right?
>>
>> =====
>>  word.set(tokenizer.nextToken());
>>           output.collect(word, one);
>> ======
>> change it to:
>> word.set("filename"+"    "+tokenizer.nextToken());
>> output.collect(word,one);
>>
>>
>>
>>
>>  Regards,
>> *Stanley Shi,*
>>
>>
>>
>> On Wed, Mar 19, 2014 at 8:50 PM, Ranjini Rathinam <ranjinibecse@gmail.com
>> > wrote:
>>
>>> Hi,
>>>
>>> I have folder named INPUT.
>>>
>>> Inside INPUT i have 5 resume are there.
>>>
>>> hduser@localhost:~/Ranjini$ hadoop fs -ls /user/hduser/INPUT
>>> Found 5 items
>>> -rw-r--r--   1 hduser supergroup       5438 2014-03-18 15:20
>>> /user/hduser/INPUT/Rakesh Chowdary_Microstrategy.txt
>>> -rw-r--r--   1 hduser supergroup       6022 2014-03-18 15:22
>>> /user/hduser/INPUT/Ramarao Devineni_Microstrategy.txt
>>> -rw-r--r--   1 hduser supergroup       3517 2014-03-18 15:21
>>> /user/hduser/INPUT/vinitha.txt
>>> -rw-r--r--   1 hduser supergroup       3517 2014-03-18 15:21
>>> /user/hduser/INPUT/sony.txt
>>> -rw-r--r--   1 hduser supergroup       3517 2014-03-18 15:21
>>> /user/hduser/INPUT/ravi.txt
>>> hduser@localhost:~/Ranjini$
>>>
>>> I have to process the folder and the content .
>>>
>>> I need ouput has
>>>
>>> filename   word   occurance
>>> vinitha       java       4
>>> sony          oracle      3
>>>
>>>
>>>
>>> But iam not getting the filename.  Has the input file content are merged
>>> file name is not getting correct .
>>>
>>>
>>> please help in this issue to fix.  I have given by code below
>>>
>>>
>>>  import java.io.IOException;
>>>  import java.util.*;
>>>  import org.apache.hadoop.fs.Path;
>>>  import org.apache.hadoop.conf.*;
>>>  import org.apache.hadoop.io.*;
>>>  import org.apache.hadoop.mapred.*;
>>>  import org.apache.hadoop.util.*;
>>> import java.io.File;
>>> import java.io.FileReader;
>>> import java.io.FileWriter;
>>> import java.io.IOException;
>>> import org.apache.hadoop.fs.Path;
>>> import org.apache.hadoop.conf.Configuration;
>>> import org.apache.hadoop.fs.FileSystem;
>>> import org.apache.hadoop.fs.FileStatus;
>>> import org.apache.hadoop.conf.*;
>>> import org.apache.hadoop.io.*;
>>> import org.apache.hadoop.mapred.*;
>>> import org.apache.hadoop.util.*;
>>> import org.apache.hadoop.mapred.lib.*;
>>>
>>>  public class WordCount {
>>>     public static class Map extends MapReduceBase implements
>>> Mapper<LongWritable, Text, Text, IntWritable> {
>>>      private final static IntWritable one = new IntWritable(1);
>>>       private Text word = new Text();
>>>       public void map(LongWritable key, Text value,
>>> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
>>> IOException {
>>>    FSDataInputStream fs=null;
>>>    FileSystem hdfs = null;
>>>    String line = value.toString();
>>>          int i=0,k=0;
>>>   try{
>>>    Configuration configuration = new Configuration();
>>>       configuration.set("fs.default.name", "hdfs://localhost:4440/");
>>>
>>>    Path srcPath = new Path("/user/hduser/INPUT/");
>>>
>>>    hdfs = FileSystem.get(configuration);
>>>    FileStatus[] status = hdfs.listStatus(srcPath);
>>>    fs=hdfs.open(srcPath);
>>>    BufferedReader br=new BufferedReader(new
>>> InputStreamReader(hdfs.open(srcPath)));
>>>
>>> String[] splited = line.split("\\s+");
>>>     for( i=0;i<splited.length;i++)
>>>  {
>>>      String sp[]=splited[i].split(",");
>>>      for( k=0;k<sp.length;k++)
>>>  {
>>>
>>>    if(!sp[k].isEmpty()){
>>> StringTokenizer tokenizer = new StringTokenizer(sp[k]);
>>> if((sp[k].equalsIgnoreCase("C"))){
>>>         while (tokenizer.hasMoreTokens()) {
>>>           word.set(tokenizer.nextToken());
>>>           output.collect(word, one);
>>>         }
>>> }
>>> if((sp[k].equalsIgnoreCase("JAVA"))){
>>>         while (tokenizer.hasMoreTokens()) {
>>>           word.set(tokenizer.nextToken());
>>>           output.collect(word, one);
>>>         }
>>> }
>>>       }
>>>     }
>>> }
>>>  } catch (IOException e) {
>>>     e.printStackTrace();
>>>  }
>>> }
>>> }
>>>     public static class Reduce extends MapReduceBase implements
>>> Reducer<Text, IntWritable, Text, IntWritable> {
>>>       public void reduce(Text key, Iterator<IntWritable> values,
>>> OutputCollector<Text, IntWritable> output, Reporter reporter) throws
>>> IOException {
>>>         int sum = 0;
>>>         while (values.hasNext()) {
>>>           sum += values.next().get();
>>>         }
>>>         output.collect(key, new IntWritable(sum));
>>>       }
>>>     }
>>>     public static void main(String[] args) throws Exception {
>>>
>>>
>>>       JobConf conf = new JobConf(WordCount.class);
>>>       conf.setJobName("wordcount");
>>>       conf.setOutputKeyClass(Text.class);
>>>       conf.setOutputValueClass(IntWritable.class);
>>>       conf.setMapperClass(Map.class);
>>>       conf.setCombinerClass(Reduce.class);
>>>       conf.setReducerClass(Reduce.class);
>>>       conf.setInputFormat(TextInputFormat.class);
>>>       conf.setOutputFormat(TextOutputFormat.class);
>>>       FileInputFormat.setInputPaths(conf, new Path(args[0]));
>>>       FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>>>       JobClient.runJob(conf);
>>>     }
>>>  }
>>>
>>>
>>>
>>> Please help
>>>
>>> Thanks in advance.
>>>
>>> Ranjini
>>>
>>>
>>>
>>
>>
>

Mime
View raw message