hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Something Something <mailinglist...@gmail.com>
Subject Loader for small files
Date Mon, 11 Feb 2013 18:22:05 GMT
Hello,

We are running into performance issues with Pig/Hadoop because our input
files are small.  Everything goes to only 1 Mapper.  To get around this, we
are trying to use our own Loader like this:

1)  Extend PigStorage:

public class SmallFileStorage extends PigStorage {

    public SmallFileStorage(String delimiter) {
        super(delimiter);
    }

    @Override
    public InputFormat getInputFormat() {
        return new NLineInputFormat();
    }
}



2)  Add command line argument to the Pig command as follows:

-Dmapreduce.input.lineinputformat.linespermap=500000



3)  Use SmallFileStorage in the Pig script as follows:

USING com.xxx.yyy.SmallFileStorage ('\t')


But this doesn't seem to work.  We still see that everything is going to
one mapper.  Before we spend any more time on this, I am wondering if this
is a good approach – OR – if there's a better approach?  Please let me
know.  Thanks.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message