Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 86025 invoked from network); 12 Oct 2010 16:14:06 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 12 Oct 2010 16:14:06 -0000 Received: (qmail 23375 invoked by uid 500); 12 Oct 2010 16:14:05 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 23349 invoked by uid 500); 12 Oct 2010 16:14:05 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 23341 invoked by uid 99); 12 Oct 2010 16:14:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Oct 2010 16:14:05 +0000 X-ASF-Spam-Status: No, hits=4.7 required=10.0 tests=FREEMAIL_FROM,FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of michael_segel@hotmail.com designates 65.55.34.91 as permitted sender) Received: from [65.55.34.91] (HELO col0-omc2-s17.col0.hotmail.com) (65.55.34.91) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Oct 2010 16:13:57 +0000 Received: from COL117-W32 ([65.55.34.71]) by col0-omc2-s17.col0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 12 Oct 2010 09:13:38 -0700 Message-ID: Content-Type: multipart/alternative; boundary="_d61993fb-fcb0-4063-a7eb-b4ca0c2a06cc_" X-Originating-IP: [65.167.11.254] From: Michael Segel To: Subject: RE: Using external indexes in an HBase Map/Reduce job... Date: Tue, 12 Oct 2010 11:13:37 -0500 Importance: Normal In-Reply-To: References: , MIME-Version: 1.0 X-OriginalArrivalTime: 12 Oct 2010 16:13:38.0009 (UTC) FILETIME=[6D97C490:01CB6A28] --_d61993fb-fcb0-4063-a7eb-b4ca0c2a06cc_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Thanks for the reply... That's not exactly what I'm looking for... Suppose you have an exterior system which provides you the list of row keys= you want.=20 What ever that system is. So you have a java list object and you want to do a M/R based on input from= a Java List. What's the best way to do it? > From: octo47@gmail.com > Date: Tue=2C 12 Oct 2010 16:54:00 +0400 > Subject: Re: Using external indexes in an HBase Map/Reduce job... > To: user@hbase.apache.org >=20 > Hi Michael Segel. >=20 > If I understand your question correctrly=2C you looking for optimal way > for scanning > index search results? If not=2C my answer below is not relevant :). >=20 > 1. For mr joins or large index results scan bloom filters can be used > like described here > http://blog.rapleaf.com/dev/2009/09/25/batch-querying-with-cascading/ >=20 > 2. Another option: denormalize data in same or separate table. > (depends on nature of object relations). >=20 > 3. Random gets. For each row from solr issue random get. (for really > small result sets or paging). >=20 > 4. Put compacted data (latest data=2C small subset of data etc) into solr= index. >=20 >=20 > 2010/10/12 Michael Segel : > > > > Hi=2C > > > > Now I realize that most everyone is sitting in NY=2C while some of us c= an't leave our respective cities.... > > > > Came across this problem and I was wondering how others solved it. > > > > Suppose you have a really large table with 1 billion rows of data. > > Since HBase really doesn't have any indexes built in (Don't get me star= ted about the contrib/transactional stuff...)=2C you're forced to use some = sort of external index=2C or roll your own index table. > > > > The net result is that you end up with a list object that contains your= result set. > > > > So the question is... what's the best way to feed the list object in? > > > > One option I thought about is writing the object to a file and then usi= ng it as the file in and then control the splitters. Not the most efficient= but it would work. > > > > Was trying to find a more 'elegant' solution and I'm sure that anyone u= sing SOLR or LUCENE or whatever... had come across this problem too. > > > > Any suggestions? > > > > Thx > > > > = --_d61993fb-fcb0-4063-a7eb-b4ca0c2a06cc_--