Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2EA43D292 for ; Tue, 15 Jan 2013 20:42:19 +0000 (UTC) Received: (qmail 21612 invoked by uid 500); 15 Jan 2013 20:42:18 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 21561 invoked by uid 500); 15 Jan 2013 20:42:18 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 21552 invoked by uid 99); 15 Jan 2013 20:42:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Jan 2013 20:42:18 +0000 X-ASF-Spam-Status: No, hits=2.9 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [173.201.193.106] (HELO p3plsmtpa08-05.prod.phx3.secureserver.net) (173.201.193.106) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Jan 2013 20:42:11 +0000 Received: from [192.168.0.4] ([24.28.83.73]) by p3plsmtpa08-05.prod.phx3.secureserver.net with id oLho1k0011avHdU01LhoKm; Tue, 15 Jan 2013 13:41:48 -0700 From: "David G. Boney" Content-Type: multipart/alternative; boundary="Apple-Mail=_940D9953-1669-43BB-8F22-59F60752E04B" Subject: Bloom filter based scanner/filter - repost Message-Id: <72D1C755-EF52-4889-AED1-778C978BFEB2@austin-acm-sigkdd.org> Date: Tue, 15 Jan 2013 14:41:48 -0600 To: "dev@hbase.apache.org" Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_940D9953-1669-43BB-8F22-59F60752E04B Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Sorry, I hit the send button before finishing the message. I am building a data cube on top of HBase. All access to the data is by = map/reduce jobs. I want to build a scanner where its first matching = criteria is based on the set intersection of bloom filters, followed by = additional matching criteria specified in the current filter = architecture. First, I run a map/reduce job on table A. For every row I = match in table A, I add the row key to a bloom filter. I then do a = map/reduce job on table B, where the row keys are over the same domain = as table A. I want to build a scanner that can use the builtin Bloom = filters in HBase. When the scanner goes to get the block of data to = which a row key based bloom filter is attached, it does a set = intersection with the table A bloom filter to see if any of the keys = from Table A are in the block. If so, the block is read in and the the = scanner does addition matching on the rows according to the filter. This is a simplification of my problem. I am trying to find out what the = complexity of implementing such a feature would be in HBase. ----------------- Sincerely, David G. Boney Chair, Austin ACM SIGKDD chair@austin-acm-sigkdd.org http://www.meetup.com/Austin-ACM-SIGKDD/ http://tech.groups.yahoo.com/group/austinsigkdd/ --Apple-Mail=_940D9953-1669-43BB-8F22-59F60752E04B--