giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vitaly Tsvetkoff <>
Subject How to specify filtering in hbase "query" during input superstep
Date Wed, 09 Sep 2015 06:44:08 GMT
I use giraph-hbase and write custom CustomHBaseTableInputFormat.
I want to apply some filters (like o.a.h.hbase.filter.RowFilter,
FamilyFilter etc) to get clear data after the "query". For example, I want
to get only vertex with specifying rowkey id. Is it possible?
I try to do it like this:
public class CustomHBaseTableInputFormat extends HBaseVertexInputFormat {
    public VertexReader<Text, FloatWritable, FloatWritable>
createVertexReader(InputSplit split, TaskAttemptContext context) throws
IOException {
        return new CustomHBaseReader (split, context);
    // other methods to impliment

    public static class CustomHBaseReader extends HBaseVertexReader {
        public HBaseTableReader(InputSplit split, TaskAttemptContext
context) throws IOException {
            super(split, context);

        public void initialize(InputSplit inputSplit, TaskAttemptContext
context) throws IOException, InterruptedException {
            super.initialize(inputSplit, context);
            String startIdsRegexp = getStartVertexRegexp();
            System.err.println("set row filter with regexp=" +
            Filter rowFilter = new RowFilter(CompareFilter.CompareOp.EQUAL,
new RegexStringComparator(startIdsRegexp));
            Scan scan =
            System.err.println("scan=" + scan);
          //super.initialize(inputSplit, context);
    // other methods to impliment

Log says what scan contains my filter but all of dataset is read (without
applying any filters).

I know about vertexInputFilterClass property, but it filters after query
with a lot of unusable data.
What is a way to set filters correctly? Can I use o.a.h.hbase.filter
package for this? If yes, what do I wrong?

View raw message