hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From madhu phatak <phatak....@gmail.com>
Subject Re: InputFormat for a big file
Date Mon, 20 Dec 2010 08:59:31 GMT
If I use FileInputFormat it gives instantiation error since FileInputFormat
is abstract class.

On Sat, Dec 18, 2010 at 3:21 AM, Aman <aman_doon@hotmail.com> wrote:

>
> Use FileInputFormat
>
>
> You mapper will look something like this
>
> public class MyMapper extends Mapper<....>{
> int sum=0;
>
> @Override
> public void map(LongWritable key, Text values, Context context){
>       sum = sum+Integer.parseInt(values.toString());
>   }
>
> @Override
>    public void cleanup(Mapper.Context context) throws IOException,
> InterruptedException {
>        context.write("sum",new Text(sum+""));
>    }
> }
>
> Your reducer will look something like
> public class MyReducer extends Reducer<Text, Text, Text, NullWritable>{
>  private NullWritable outputValue = NullWritable.get();
>
>
> public void reduce(Text key, Iterable<Text> values, Context context){
>  int sum = 0;
>            for (Text value : values) {
>                sum = sum + Integer.parseInt(value.toString());
>            }
>  context.write(new Text(sum+""), outputValue);
>
> }
>
>
> }
>
>
> madhu phatak wrote:
> >
> > Hi
> > I have a very large file of size 1.4 GB. Each line of the file is a
> number
> > .
> > I want to find the sum all those numbers.
> > I wanted to use NLineInputFormat as a InputFormat but it sends only one
> > line
> > to the Mapper which is very in efficient.
> > So can you guide me to write a InputFormat which splits the file
> > into multiple Splits and each mapper can read multiple
> > line from each split
> >
> > Regards
> > Madhukar
> >
> >
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/InputFormat-for-a-big-file-tp2105461p2107514.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message