hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Scheidtmann <jens.scheidtm...@gmail.com>
Subject Types and SequenceFiles
Date Thu, 30 May 2013 20:09:28 GMT
Dear list,

I have created a sequence file like this:

    seqWriter = SequenceFile.createWriter(fs, getConf(), new
Path(hdfsPath), IntWritable.class, BytesWritable.class,
SequenceFile.CompressionType.NONE);
    seqWriter.append(new IntWritable(index++), new BytesWritable(buf));

(with buf a byte array.)

Now, when reading the same sequence file in a map reduce job, I specify the
mapper like this:

    public static class NoOfMovesMapper
        extends Mapper<IntWritable, BytesWritable, IntWritable,
IntWritable> {

and configure the SequenceFile as:

    SequenceFileAsBinaryInputFormat.addInputPath(jobConf, new
Path(args[i]));

This job fails with:

    java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot
be cast to org.apache.hadoop.io.IntWritable
        at
org.gostats.hadoop.NoOfMoves$NoOfMovesMapper.map(NoOfMoves.java:1)

I have to specify the mapper as

     extends Mapper<LongWritable, Text, IntWritable, IntWritable> {

to read the sequence file. But then the number of records and invocations
of the map is much larger than I would expect. I thought that I will have
as many invocations of map as records in the sequence file.

What am I doing wrong? Were am I wrong?

Thanks in advance,

Jens

Mime
View raw message