giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vitaly Tsvetkoff <vi.v.tsvetk...@gmail.com>
Subject How to specify filtering in hbase "query" during input superstep
Date Wed, 09 Sep 2015 06:44:08 GMT
Hello!
I use giraph-hbase and write custom CustomHBaseTableInputFormat.
I want to apply some filters (like o.a.h.hbase.filter.RowFilter,
FamilyFilter etc) to get clear data after the "query". For example, I want
to get only vertex with specifying rowkey id. Is it possible?
I try to do it like this:
public class CustomHBaseTableInputFormat extends HBaseVertexInputFormat {
    @Override
    public VertexReader<Text, FloatWritable, FloatWritable>
createVertexReader(InputSplit split, TaskAttemptContext context) throws
IOException {
        return new CustomHBaseReader (split, context);
    }
    // other methods to impliment

    public static class CustomHBaseReader extends HBaseVertexReader {
        public HBaseTableReader(InputSplit split, TaskAttemptContext
context) throws IOException {
            super(split, context);
        }

        @Override
        public void initialize(InputSplit inputSplit, TaskAttemptContext
context) throws IOException, InterruptedException {
            super.initialize(inputSplit, context);
            String startIdsRegexp = getStartVertexRegexp();
            System.err.println("set row filter with regexp=" +
startIdsRegexp);
            Filter rowFilter = new RowFilter(CompareFilter.CompareOp.EQUAL,
new RegexStringComparator(startIdsRegexp));
            Scan scan =
HBaseVertexInputFormat.BASE_FORMAT.getScan().setFilter(rowFilter);
            System.err.println("scan=" + scan);
          //super.initialize(inputSplit, context);
        }
    }
    // other methods to impliment
}

Log says what scan contains my filter but all of dataset is read (without
applying any filters).

I know about vertexInputFilterClass property, but it filters after query
with a lot of unusable data.
What is a way to set filters correctly? Can I use o.a.h.hbase.filter
package for this? If yes, what do I wrong?

Mime
View raw message