Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 440F617F30 for ; Wed, 9 Sep 2015 06:44:11 +0000 (UTC) Received: (qmail 49068 invoked by uid 500); 9 Sep 2015 06:44:11 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 49022 invoked by uid 500); 9 Sep 2015 06:44:10 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 49012 invoked by uid 99); 9 Sep 2015 06:44:10 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Sep 2015 06:44:10 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 800A2C027D for ; Wed, 9 Sep 2015 06:44:10 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.898 X-Spam-Level: ** X-Spam-Status: No, score=2.898 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id Tc41Ie50QHNK for ; Wed, 9 Sep 2015 06:44:09 +0000 (UTC) Received: from mail-yk0-f172.google.com (mail-yk0-f172.google.com [209.85.160.172]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id D9529205E9 for ; Wed, 9 Sep 2015 06:44:08 +0000 (UTC) Received: by ykdu9 with SMTP id u9so842262ykd.2 for ; Tue, 08 Sep 2015 23:44:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=50BnDeiLuHGyy0eCiWFgpayVIVgpm9xCn6sQ71qHmP4=; b=ASwikS+2HVgg7sSLVdeB4LWEzcldqMtSqkHzSkF6HCTp6vr9cNT8oZ9bi0ZZF7sYHs dKvq2a/3rZTiqwXEm/d/yZpS57JpzjFij8cjFDK1eXDbE6LYJCNtlWbDei5U4a0Qmr8z 8YBCoy8TKzsHHiyRhy08eU6U6nevxYfeEhGQzUr6+ka9tO49IZ0ju8o70L5YjPXDoQDG 45MHQHKv1sgumVF/iRVRDcTqTARdilg7mmOftuH2iy7ec2exFKprqKfUim3xxYrksQil smt4nSQZ1pqpQWc0rFLFM69utZqY/kqlJ3V5wFSOW/ML7SL/RKK0MkOf13iciappotgc H6lQ== MIME-Version: 1.0 X-Received: by 10.129.73.203 with SMTP id w194mr16647612ywa.16.1441781048185; Tue, 08 Sep 2015 23:44:08 -0700 (PDT) Received: by 10.103.22.135 with HTTP; Tue, 8 Sep 2015 23:44:08 -0700 (PDT) Date: Wed, 9 Sep 2015 10:44:08 +0400 Message-ID: Subject: How to specify filtering in hbase "query" during input superstep From: Vitaly Tsvetkoff To: user@giraph.apache.org Content-Type: multipart/alternative; boundary=001a114daa86389fc7051f4acf25 --001a114daa86389fc7051f4acf25 Content-Type: text/plain; charset=UTF-8 Hello! I use giraph-hbase and write custom CustomHBaseTableInputFormat. I want to apply some filters (like o.a.h.hbase.filter.RowFilter, FamilyFilter etc) to get clear data after the "query". For example, I want to get only vertex with specifying rowkey id. Is it possible? I try to do it like this: public class CustomHBaseTableInputFormat extends HBaseVertexInputFormat { @Override public VertexReader createVertexReader(InputSplit split, TaskAttemptContext context) throws IOException { return new CustomHBaseReader (split, context); } // other methods to impliment public static class CustomHBaseReader extends HBaseVertexReader { public HBaseTableReader(InputSplit split, TaskAttemptContext context) throws IOException { super(split, context); } @Override public void initialize(InputSplit inputSplit, TaskAttemptContext context) throws IOException, InterruptedException { super.initialize(inputSplit, context); String startIdsRegexp = getStartVertexRegexp(); System.err.println("set row filter with regexp=" + startIdsRegexp); Filter rowFilter = new RowFilter(CompareFilter.CompareOp.EQUAL, new RegexStringComparator(startIdsRegexp)); Scan scan = HBaseVertexInputFormat.BASE_FORMAT.getScan().setFilter(rowFilter); System.err.println("scan=" + scan); //super.initialize(inputSplit, context); } } // other methods to impliment } Log says what scan contains my filter but all of dataset is read (without applying any filters). I know about vertexInputFilterClass property, but it filters after query with a lot of unusable data. What is a way to set filters correctly? Can I use o.a.h.hbase.filter package for this? If yes, what do I wrong? --001a114daa86389fc7051f4acf25 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hello!
I use giraph-hbase and write custom CustomHBase= TableInputFormat.
I want to apply some filters (like o.a.h.hbase.= filter.RowFilter, FamilyFilter etc) to get clear data after the "query= ". For example, I want to get only vertex with specifying rowkey id. I= s it possible?
I try to do it like this:
public cl= ass=C2=A0CustomHBaseTableInputFormat=C2=A0extends HBaseVertexInputFormat {<= br>
=C2=A0 =C2=A0 @Override
=C2=A0 =C2=A0 public Vertex= Reader<Text, FloatWritable, FloatWritable> createVertexReader(InputSp= lit split, TaskAttemptContext context) throws IOException {
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 return new=C2=A0CustomHBaseReader=C2=A0(split, con= text);
=C2=A0 =C2=A0 }
=C2=A0 =C2=A0 // other met= hods to impliment

=C2=A0 =C2=A0 public static= class=C2=A0CustomHBaseReader=C2=A0extends HBaseVertexReader {
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 public HBaseTableReader(InputSplit split, TaskA= ttemptContext context) throws IOException {
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 super(split, context);
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 }

=C2=A0 =C2=A0 =C2=A0 =C2=A0 @Override=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 public void initialize(InputSplit in= putSplit, TaskAttemptContext context) throws IOException, InterruptedExcept= ion {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 super.initialize(= inputSplit, context);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 String startIdsRegexp =3D getStartVertexRegexp();
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 System.err.println("set row filter wit= h regexp=3D" + startIdsRegexp);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 Filter rowFilter =3D new RowFilter(CompareFilter.CompareOp.EQ= UAL, new RegexStringComparator(startIdsRegexp));
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 Scan scan =3D HBaseVertexInputFormat.BASE_FORMA= T.getScan().setFilter(rowFilter);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 System.err.println("scan=3D" + scan);
=C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 //super.initialize(inputSplit, context);
=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 }
=C2=A0 =C2=A0 }
=C2=A0 =C2=A0 // other methods to impliment
}
Log says what scan contains my filter but all of dataset is rea= d (without applying any filters).

I know about ver= texInputFilterClass property, but it filters after query with a lot of unus= able data.
What is a way to set filters correctly? Can I use = o.a.h.hbase.filter package for this? If yes, what do I wrong?
--001a114daa86389fc7051f4acf25--