hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Something Something <mailinglist...@gmail.com>
Subject Reducer that outputs no key
Date Fri, 24 May 2013 07:55:43 GMT
Hello,

Trying to use Hadoop Streaming to create output that contains no key - just
value.

Here's what I am trying:

1)  Created IdentifierResolver as follows:

public class MyIdentifierResolver extends IdentifierResolver {

    public void resolve(String identifier) {
        System.out.println("Entered resolve with identifier: " +
identifier);
        super.resolve(identifier);
        if (identifier.equals("NullWritable")) {
            System.out.println("Setting output key class to NullWritable");
            setOutputKeyClass(NullWritable.class);
        }
    }


2)  Set the properties as follows:

-Dstream.io.identifier.resolver.class=com.my.package.MyIdentifierResolver \
-Dstream.map.output=NullWritable \
-Dstream.reduce.output=NullWritable


This should work right?  But it's still writing the 'key' in the output.
Is there a better way to do this in Hadoop?

Note:  Basically, we are trying to merge files (over 2000) into smaller
number of files (e.g. 500).  The files are too big so 'getmerge' does not
work 'cause we run into space issues.

Please help.  Thanks.

Mime
View raw message