Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 11AC9D0AC for ; Wed, 1 Aug 2012 08:19:24 +0000 (UTC) Received: (qmail 80536 invoked by uid 500); 1 Aug 2012 08:19:22 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 80206 invoked by uid 500); 1 Aug 2012 08:19:21 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 80176 invoked by uid 99); 1 Aug 2012 08:19:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Aug 2012 08:19:20 +0000 X-ASF-Spam-Status: No, hits=3.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FORGED_REPLYTO,FREEMAIL_REPLYTO_END_DIGIT,FSL_FREEMAIL_1,FSL_FREEMAIL_2,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [77.238.189.61] (HELO nm4.bullet.mail.ird.yahoo.com) (77.238.189.61) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 01 Aug 2012 08:19:12 +0000 Received: from [77.238.189.49] by nm4.bullet.mail.ird.yahoo.com with NNFMP; 01 Aug 2012 08:18:51 -0000 Received: from [212.82.108.248] by tm2.bullet.mail.ird.yahoo.com with NNFMP; 01 Aug 2012 08:18:51 -0000 Received: from [127.0.0.1] by omp1013.mail.ird.yahoo.com with NNFMP; 01 Aug 2012 08:18:51 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 425030.23802.bm@omp1013.mail.ird.yahoo.com Received: (qmail 41383 invoked by uid 60001); 1 Aug 2012 08:18:51 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.de; s=s1024; t=1343809131; bh=vnJQp2JGwdtNWkWJBbL9cEJguh/DrM0VMZHsL/SLBVE=; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=GQKNwfB3PSKSEOhVKIuLifQFdCI2b1RAOPyNYvntgfrnMv/Ot8dZPbyMmRH48VJyk9NPImFTk0ztDSdYA7TCnnpLZFOmWk+OWAisITCRgozK5GVL4/XDtokurHId9BHn3ji9p2pPFUY0ZMXRI79WRpEX15bJwHSv9tISdfOYHgc= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.de; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=v5fHzB90B2z4ewkvvIuw+TpnlxdrtC3Dz1+ugv86OTI3n/XbIZkZP31TGnmFYRPyany6/0jrIqhJAbqEhxBzQnpe7d6pfNaK+M0g1YKPTQ37X+duWUZd4xjcuOLbnG8wpZGS/28pE5HheEGpi6GB5IeA1wtezOeBjhYOtIMWuzY=; X-YMail-OSG: q1D8lucVM1mDK3K0nKuXFjeCWOt05JOaOe_Gd3Qz7lirmop xDkAez9aGU9heei2oLGt9phTzdjQDbCucEp4.ApDn6Vlw5lvI2MMDC0i9.yW tv6AAqt1du1uwgyXn1mBZtXeCpKGlw2f0eXzDg5Yy9b8oHko5L2vj.PC8IMZ NuG.R1jHvVsOmDwUtGNuvuu.gHISPea__SfnRL4zeMTS3f7II3J2j5.rj2N. 269orI_hCIdPWaeEolMqHeuiy9U_A9SMM6QihWsJzDS6nwRoK0CN_6BRVEsU XfATxjKXB2T2IFcQmNnCCigXpAwiphSSqmdno03fgD7JIedtcNv90VTlBanQ PikvRbMB0baVrZ7csRrv69o1dfpQJ0v_cRDMa7yN1CsjtgBeHcdZX.ILivcG QR0x0l90sGk.5QdaEhu9Xa2_e7FlDyxvOLNIVia97vAj8lCGaN4AQZd4- Received: from [195.13.41.220] by web171503.mail.ir2.yahoo.com via HTTP; Wed, 01 Aug 2012 09:18:50 BST X-Mailer: YahooMailWebService/0.8.120.356233 References: <1343748460.12346.YahooMailNeo@web171503.mail.ir2.yahoo.com> Message-ID: <1343809130.3275.YahooMailNeo@web171503.mail.ir2.yahoo.com> Date: Wed, 1 Aug 2012 09:18:50 +0100 (BST) From: =?iso-8859-1?Q?Christian_Sch=E4fer?= Reply-To: =?iso-8859-1?Q?Christian_Sch=E4fer?= Subject: Re: How to query by rowKey-infix To: "user@hbase.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Thanks Matt & Jerry for your replies.=0A=0AThe data for each row is small (= some hundred Bytes).=0A=0ASo, I will try the parallel table scan at first a= s you suggested...=0ABefore organizing that by myself, wouldn't it be a bet= ter idea to create a map reduce job for that?=0A=0AI'm not so keen on imple= menting secondary indices especially due to the mentioned consistency conce= rns.=0AUnfortunately projects like ithbase and ihbase are no more supportin= g current hbase and secondary indexes by coprocessors seems are not yet to = there.=0AIf I'm wrong feel free to correct me :)=0A=0Aregards,=0AChris=0A= =0A=0A=0A----- Urspr=FCngliche Message -----=0AVon: Matt Corgan =0AAn: user@hbase.apache.org=0ACC: Christian Sch=E4fer =0AGesendet: 19:41 Dienstag, 31.Juli 2012=0ABetreff: Re: How to= query by rowKey-infix=0A=0AWhen deciding between a table scan vs secondary= index, you should try to=0Aestimate what percent of the underlying data bl= ocks will be used in the=0Aquery.=A0 By default, each block is 64KB.=0A=0AI= f each user's data is small and you are fitting multiple users per block,= =0Athen you're going to need all the blocks, so a tablescan is better becau= se=0Ait's simpler.=A0 If each user has 1MB+ data then you will want to pick= out=0Athe individual blocks relevant to each date.=A0 The secondary index = will help=0Ayou go directly to those sparse blocks, but with a cost in comp= lexity,=0Aconsistency, and extra denormalized data that knocks primary data= out of=0Ayour block cache.=0A=0AIf latency is not a concern, I would start= with the table scan.=A0 If that's=0Atoo slow you add the secondary index, = and if you still need it faster you=0Ado the primary key lookups in paralle= l as Jerry mentions.=0A=0AMatt=0A=0AOn Tue, Jul 31, 2012 at 10:10 AM, Jerry= Lam wrote:=0A=0A> Hi Chris:=0A>=0A> I'm thinking ab= out building a secondary index for primary key lookup, then=0A> query using= the primary keys in parallel.=0A>=0A> I'm interested to see if there is ot= her option too.=0A>=0A> Best Regards,=0A>=0A> Jerry=0A>=0A> On Tue, Jul 31,= 2012 at 11:27 AM, Christian Sch=E4fer >wrote:=0A= >=0A> > Hello there,=0A> >=0A> > I designed a row key for queries that need= best performance (~100 ms)=0A> > which looks like this:=0A> >=0A> > userId= -date-sessionId=0A> >=0A> > These queries(scans) are always based on a user= Id and sometimes=0A> > additionally on a date, too.=0A> > That's no problem= with the key above.=0A> >=0A> > However, another kind of queries shall be = based on a given time range=0A> > whereas the outermost left userId is not = given or known.=0A> > In this case I need to get all rows covering the give= n time range with=0A> > their date to create a daily reporting.=0A> >=0A> >= As I can't set wildcards at the beginning of a left-based index for the=0A= > > scan,=0A> > I only see the possibility to scan the index of the whole t= able to=0A> collect=0A> > the=0A> > rowKeys that are inside the timerange I= 'm interested in.=0A> >=0A> > Is there a more elegant way to collect rows w= ithin time range X?=0A> > (Unfortunately, the date attribute is not equal t= o the timestamp that is=0A> > stored by hbase automatically.)=0A> >=0A> > C= ould/should one maybe leverage some kind of row key caching to=0A> accelera= te=0A> > the collection process?=0A> > Is that covered by the block cache?= =0A> >=0A> > Thanks in advance for any advice.=0A> >=0A> > regards=0A> > Ch= ris=0A> >=0A>=0A