hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jimmy Wan <ji...@indeed.com>
Subject Batching key/value pairs to map
Date Mon, 23 Feb 2009 20:06:29 GMT
part of my map/reduce process could be greatly sped up by mapping
key/value pairs in batches instead of mapping them one by one. I'd
like to do the following:
    protected abstract void batchMap(OutputCollector<K2, V2>
k2V2OutputCollector, Reporter reporter) throws IOException;

    public void map(K1 key1, V1 value1, OutputCollector<K2, V2>
output, Reporter reporter) throws IOException {
        keys.add(key1.copy());
        values.add(value1.copy());
        if (++currentSize == batchSize) {
            batchMap(output, reporter);
            clear();
        }
    }

    public void close() throws IOException {
        if (currentSize > 0) {
            // I don't have access to my OutputCollector or Reporter here!
            batchMap(output, reporter);
            clear();
        }
    }

Can I safely hang onto my OutputCollector and Reporter from calls to map?

I'm currently running Hadoop 0.17.2.1. Is this something I could do in
Hadoop 0.19.X?

Mime
View raw message