hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramya S" <ram...@suntecgroup.com>
Subject RE: Sorting a csv file
Date Wed, 15 Jan 2014 11:39:24 GMT
All you need is to change the map output value class to TEXT format.
Set this accordingly in the main.
 
Eg:
 
public static class Map extends Mapper<LongWritable, Text, Text, Text> {
   private Text one = new Text("");
   private Text word = new Text();
       
   public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
    System.out.println("in mapper");
       String line = value.toString();
       StringTokenizer tokenizer = new StringTokenizer(line);
       while (tokenizer.hasMoreTokens()) {
           word.set(tokenizer.nextToken());
           context.write(word, one);
           System.out.println("sort: "+word);
       }
   }
} 
 
Regards...?
Ramya.S
 

________________________________

From: unmesha sreeveni [mailto:unmeshabiju@gmail.com]
Sent: Wed 1/15/2014 4:11 PM
To: User Hadoop
Subject: Re: Sorting a csv file


I did a map only job for sorting a txt file by editing wordcount program.
I only need the key .
How to set value to null.


public class SortingCsv {
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
   private final static IntWritable one = new IntWritable(1);
   private Text word = new Text();
       
   public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
    System.out.println("in mapper");
       String line = value.toString();
       StringTokenizer tokenizer = new StringTokenizer(line);
       while (tokenizer.hasMoreTokens()) {
           word.set(tokenizer.nextToken());
           context.write(word, one);
           System.out.println("sort: "+word);
       }
   }
} 
public static void main(String[] args) throws Exception {
System.out.println("in main");
   Configuration conf = new Configuration();
       
       Job job = new Job(conf, "wordcount");
       job.setJarByClass(SortingCsv.class);
       //Path intermediateInfo = new Path("out");
   job.setOutputKeyClass(Text.class);
   job.setOutputValueClass(IntWritable.class);
       
   job.setMapperClass(Map.class);
   FileSystem fs = FileSystem.get(conf);

/* Delete the files if any in the output path */

if (fs.exists(new Path(args[1])))
fs.delete(new Path(args[1]), true);

       
   job.setInputFormatClass(TextInputFormat.class);
   job.setOutputFormatClass(TextOutputFormat.class);
       
   FileInputFormat.addInputPath(job, new Path(args[0]));
   FileOutputFormat.setOutputPath(job, new Path(args[1]));
       
   job.waitForCompletion(true);
}
       
}


On Wed, Jan 15, 2014 at 2:50 PM, unmesha sreeveni <unmeshabiju@gmail.com> wrote:


	How to sort a csv file
	I know , between map and reduce shuffle and sort is taking place.
	But how do i sort each column in a csv file?
	

	-- 
	
	Thanks & Regards 
	
	
	Unmesha Sreeveni U.B
	
	Junior Developer

	http://www.unmeshasreeveni.blogspot.in/
	

	
	




-- 

Thanks & Regards 


Unmesha Sreeveni U.B

Junior Developer

http://www.unmeshasreeveni.blogspot.in/




Mime
View raw message