hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aman <aman_d...@hotmail.com>
Subject Re: InputFormat for a big file
Date Fri, 17 Dec 2010 21:51:14 GMT

Use FileInputFormat. 

You mapper will look something like this 

public class MyMapper extends Mapper<....>{
int sum=0;

public void map(LongWritable key, Text values, Context context){
       sum = sum+Integer.parseInt(values.toString());

    public void cleanup(Mapper.Context context) throws IOException,
InterruptedException {
        context.write("sum",new Text(sum+""));

Your reducer will look something like 
public class MyReducer extends Reducer<Text, Text, Text, NullWritable>{
  private NullWritable outputValue = NullWritable.get();

public void reduce(Text key, Iterable<Text> values, Context context){
  int sum = 0;
            for (Text value : values) {
                sum = sum + Integer.parseInt(value.toString());
  context.write(new Text(sum+""), outputValue);



madhu phatak wrote:
> Hi
> I have a very large file of size 1.4 GB. Each line of the file is a number
> .
> I want to find the sum all those numbers.
> I wanted to use NLineInputFormat as a InputFormat but it sends only one
> line
> to the Mapper which is very in efficient.
> So can you guide me to write a InputFormat which splits the file
> into multiple Splits and each mapper can read multiple
> line from each split
> Regards
> Madhukar

View this message in context: http://lucene.472066.n3.nabble.com/InputFormat-for-a-big-file-tp2105461p2107514.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

View raw message