hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Geoffry Roberts <geoffry.robe...@gmail.com>
Subject Multiple Outputs Not Being Written to File
Date Fri, 06 May 2011 17:55:44 GMT

I am attempting to take a large file and split it up into a series of
smaller files.  I want the smaller files to be named based on values taken
from the large file.  I am using
org.apache.hadoop.mapreduce.lib.output.MultipleOutputs to do this.

The job runs without error and produces a set of files as expected and each
file is named as expected.  But most of the files are empty.  Apparently, no
data was written to them.  The fact that the file was created at all should
confirm that there was data coming in from the mapper.  When my reducer
counts as it iterates through the values then logs the count.  I am seeing
reasonable counts in my logs.  The number of lines in an output file should
equal the count.   I have counts but no lines.

What could be causing this?

My Mapper:
protected void map(LongWritable key, Text value, Context ctx) throws
            InterruptedException {
        String[] ss = value.toString().split(",");
        String locale = ss[F.DEPARTURE_LOCALE];
        ctx.write(new Text(locale), value);

My Reducer:
private MultipleOutputs<Text, Text> mos;

 protected void setup(Context ctx) throws IOException, InterruptedException
        mos = new MultipleOutputs<Text, Text>(ctx);

    protected void reduce(Text key, Iterable<Text> values, Context ctx)
            throws IOException, InterruptedException {
        int k = 0;
         * The key at this point can have blanks and slashes. Let us get rid
         * of both.
        String blankless = key.toString().replace(' ', '+');
        String path = blankless.toString().replace("/", "");
        try {
            for (Text value : values) {
                String[] ss = value.toString().split(F.DELIMITER);
                String id = ss[F.ID];
                String[] sslessid = Arrays.copyOfRange(ss, 1, ss.length);
                String line = UT.array2String(sslessid);

// An output file is being created,
                mos.write(new Text(id), new Text(line), path);
        } catch (NullPointerException e) {
            LOG.error("<br/>" + "blankless=" + blankless);
            LOG.error("<br/>" + "values=" + values.toString());

// In my logs, I see reasonable counts even when the output file is empty.
        LOG.info("<br/>key=" + path + " count=" + k);
Geoffry Roberts

View raw message