Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 46E5C109B5 for ; Wed, 1 May 2013 05:12:54 +0000 (UTC) Received: (qmail 13988 invoked by uid 500); 1 May 2013 05:12:52 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 13881 invoked by uid 500); 1 May 2013 05:12:51 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 13861 invoked by uid 99); 1 May 2013 05:12:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 May 2013 05:12:51 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [72.30.239.136] (HELO nm32-vm0.bullet.mail.bf1.yahoo.com) (72.30.239.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 May 2013 05:12:45 +0000 Received: from [98.139.212.144] by nm32.bullet.mail.bf1.yahoo.com with NNFMP; 01 May 2013 05:12:24 -0000 Received: from [98.139.212.239] by tm1.bullet.mail.bf1.yahoo.com with NNFMP; 01 May 2013 05:12:24 -0000 Received: from [127.0.0.1] by omp1048.mail.bf1.yahoo.com with NNFMP; 01 May 2013 05:12:24 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 337436.47088.bm@omp1048.mail.bf1.yahoo.com Received: (qmail 30700 invoked by uid 60001); 1 May 2013 05:12:24 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1367385144; bh=BZzz0+h+C550fCA23P6Wmym2kc822+fz6ad8oBJ3gVo=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-RocketYMMF:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=jD1ZJDukMs9ZQe9DhmjH6s+FBcrziVhr2Kp+0ZKN0FjV1V3bi2Fdm7Cj1qK6JvhE2xZyrZhHEv6Gd7LE3O8t4p62rUgJyVvA6ygQxfkMJ2aaUh5L6lTJr7volR2ipCi5QyhhHTczLotq0VUq7/tYJH8anxdxyN6Ow04TK0Y7mys= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-RocketYMMF:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=WgLPhU32Ht0QCMaxWf1RJsIUgRrSYHpCRA18ruI9SZGmSQhw6hYPbL/zgf1oYq4nEVffMDD3qdOrxcqe5wLt4Xo2uLhozZtHaLjzEegQE0psT6ot7wmyjg6qD/7uGtWuEGGmt+WVUMUsvDGSg2+cnvN/usTrsbs1MCBLByIuZYg=; X-YMail-OSG: xXOcacMVM1k7hJPrm6S7GasODGqblz5cajENd.1IxTb0mTg WIyLs5nsl_BTdyr7Gxd_xYXP4t2Ob0we.J.cQM8Xmy0NnYzO7CvVfIeVEvoo yDY6E5rxwcS.5mhqELVNdjKRhoUdXuUtCNYU6cG7ncDFzBrxJK91RG1gDLUl buOIweysTHmxZs6quKFpcMPVfjIYcJlCe6SKghwN.rHv6EHfSkIAkoWej2a4 J2NhJGNWjtBVPokCr4GCaKK6OYneL2HvEAAOfVBnbLgM5wiTJ9_4MSaDk5bn go.MTXT.osIl.bg2Z4Zsmfx1A18GztCXOZ.BgfuBD5g1KB9VehvCgi19k7Te 3CjphyjMQVF9efStirpqx2K.Vr.frCGusoiMqF8kK1yt.CtKUVAZ5SFTZ_BE Mnu3YflWVFpQNRD.2PK7Cu.ZXcs0sGcexagliAYOX8xJyfDe79ccjMLneW5u csuIw9nQWC8TDYdMhKbVZOOsaYq.8LB90FhyYY33B.PbDFn25xSB8ByuAG_e p3lOJ66cbodzNZk0.BA.gFk_Ozd0XtSjisPNZiuGuVXATaiqA.0Py5fT0YSA Md3bx.ASGLwBuMt5kMyCAQFHxpayPXk4yi_eQ8yri1cXAa.a6Hd8xjfbQu_V FOkYhdJ.NWGhcheBSETPEZiaJl22OO8.uvTeW3E1VxUGjRoXQhbuW3v_msq8 NJzIByUW5lxiGNac0pj_q_KL_58fYLM9rWN07z1O_3Pm00VR7D1DCBH3tcNO _QwLwZc7M44h_fVikmw-- Received: from [24.130.114.129] by web140606.mail.bf1.yahoo.com via HTTP; Tue, 30 Apr 2013 22:12:24 PDT X-Rocket-MIMEInfo: 002.001,SSBkbyBub3Qgd2FudCB0byBiZSBydWRlIG9yIGFueXRoaW5nLi4uIEJ1dCBob3cgb2Z0ZW4gd2UgbmVlZCB0byBoYXZlIHRoaXMgZGlzY3Vzc2lvbj8KCldoZW4geW91IHNhbHQgeW91ciByb3drZXlzIHdpdGggc2F5IDEwIHNhbHQgdmFsdWVzIHRoZW4gZm9yIGVhY2ggcmVhZCB5b3UgbmVlZCB0byBmb3JrIG9mIDEwIHJlYWQgcmVxdWVzdHMsIGFuZCBlYWNoIG9mIHRoZW0gdG91Y2hlcyBvbmx5IDEvMTB0aCBvZiB0aGUgdGFibGVzICh3aGljaCBuaWNlbHkgd2l0aCBIQmFzZSdzIHByZWZpeCBzY2FucykuCgoBMAEBAQE- X-RocketYMMF: lhofhansl X-Mailer: YahooMailWebService/0.8.141.536 References: <1066821727.24555.1367247785075.JavaMail.www@wwinf8306> Message-ID: <1367385144.29133.YahooMailNeo@web140606.mail.bf1.yahoo.com> Date: Tue, 30 Apr 2013 22:12:24 -0700 (PDT) From: lars hofhansl Reply-To: lars hofhansl Subject: Re: Read access pattern To: "user@hbase.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="1905101558-1711534667-1367385144=:29133" X-Virus-Checked: Checked by ClamAV on apache.org --1905101558-1711534667-1367385144=:29133 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable I do not want to be rude or anything... But how often we need to have this = discussion?=0A=0AWhen you salt your rowkeys with say 10 salt values then fo= r each read you need to fork of 10 read requests, and each of them touches = only 1/10th of the tables (which nicely with HBase's prefix scans).=0A=0AOb= viously, if you only need point gets you wouldn't salting, that would be st= upid. If you mostly do range scans, than salting is quite nice.=0A=0ASaying= that salting is bad, because it does not work for point gets is like sayin= g that bulldozers are bad, because you cannot use on them race tracks. :)= =0A=0A=0A-- Lars=0A=0A=0A=0A________________________________=0A From: Micha= el Segel =0ATo: user@hbase.apache.org =0ASent: T= uesday, April 30, 2013 10:06 AM=0ASubject: Re: Read access pattern=0A =0A= =0ASure.=0A=0ABy definition, the salt number is a random seed that is not a= ssociated with the underlying record. =0AA simple example is a round robin = counter (mod the counter by 10 yielding [0..9] )=0A=0ASo you get a record, = prepend your salt and you write it out to HBase. The salt will push the dat= a out to a different region.=0A=0ABut what happens when you want to read th= e data? =0A=0ASo on a full table scan... no biggie, its the same. =0A=0ABut= suppose I want to do a partial table scan. Now I have to do multiple parti= al scans because I dont know the salt. =0AOr if I want to do a simple get()= I now have to do N number of get()s where N is the number of salt values a= llowed. In my example that's 10.=0A=0AAnd that's the problem. =0A=0AYou are= better off doing a hash of the record, use the first couple of bytes off t= he hash and then writing the record out. =0AYou want the record, take the k= ey, hash it, using the same process and you have 1 get(). =0A=0AYou're stil= l screwed up on doing a range scan, but you can't have everything.=0A=0ATHI= S IS WHY I AND MANY CARDIOLOGISTS SAY NO TO SALT. The only difference is th= at they are talking about excess sodium chloride in your diet. I'm talking = about using a salt aka 'random seed'.=0A=0ADoes that make sense? =0A=0A=0AO= n Apr 30, 2013, at 11:17 AM, Shahab Yunus wrote:= =0A=0A> Well those are *some* words :) Anyway, can you explain a bit in det= ail that=0A> why you feel so strongly about this design/approach? The salti= ng here is=0A> not the only option mentioned and static hashing can be used= as well. Plus=0A> even in case of salting, wouldn't the distributed scan t= ake care of it? The=0A> downside that I see, is the bucket_number that we h= ave to maintain both at=0A> time or reading/writing and update it in case o= f cluster restructuring.=0A> =0A> Thanks,=0A> Shahab=0A> =0A> =0A> On Tue, = Apr 30, 2013 at 11:57 AM, Michael Segel=0A> wrot= e:=0A> =0A>> Geez that's a bad article.=0A>> Never salt.=0A>> =0A>> And yes= there's a difference between using a salt and using the first 2-4=0A>> byt= es from your MD5 hash.=0A>> =0A>> (Hint: Salts are random. Your hash isn't.= )=0A>> =0A>> Sorry to be-itch but its a bad idea and it shouldn't be propa= gated.=0A>> =0A>> On Apr 29, 2013, at 10:17 AM, Shahab Yunus wrote:=0A>> =0A>>> I think you cannot use the scanner simply to = to a range scan here as your=0A>>> keys are not monotonically increasing. Y= ou need to apply logic to=0A>>> decode/reverse your mechanism that you have= used to hash your keys at the=0A>>> time of writing. You might want to che= ck out the SemaText library which=0A>>> does distributed scans and seem to = handle the scenarios that you want to=0A>>> implement.=0A>>> =0A>> http://b= log.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-= writing-records-with-sequential-keys/=0A>>> =0A>>> =0A>>> On Mon, Apr 29, 2= 013 at 11:03 AM, wrote:=0A>>> =0A>>>> Hi,=0A>>>> =0A>>>= > I have a rowkey defined by :=0A>>>>=A0 =A0 =A0 getMD5AsHex(Bytes.toBytes= (myObjectId)) + String.format("%19d\n",=0A>>>> (Long.MAX_VALUE - changeDate= .getTime()));=0A>>>> =0A>>>> How could I get the previous and next row for = a given rowkey ?=0A>>>> For instance, I have the following ordered keys := =0A>>>> =0A>>>> 00003db1b6c1e7e7d2ece41ff2184f76*9223370673172227807=0A>>>>= 00003db1b6c1e7e7d2ece41ff2184f76*9223370674468022807=0A>>>>> 00003db1b6c1e= 7e7d2ece41ff2184f76*9223370674468862807=0A>>>> 00003db1b6c1e7e7d2ece41ff218= 4f76*9223370674984237807=0A>>>> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674= 987271807=0A>>>> =0A>>>> If I choose the rowkey :=0A>>>> 00003db1b6c1e7e7d2= ece41ff2184f76*9223370674468862807, what would be the=0A>>>> correct scan t= o get the previous and next key ?=0A>>>> Result would be :=0A>>>> 00003db1b= 6c1e7e7d2ece41ff2184f76*9223370674468022807=0A>>>> 00003db1b6c1e7e7d2ece41f= f2184f76*9223370674984237807=0A>>>> =0A>>>> Thank you !=0A>>>> R.=0A>>>> = =0A>>>> Une messagerie gratuite, garantie =E0 vie et des services en plus, = =E7a vous=0A>>>> tente ?=0A>>>> Je cr=E9e ma bo=EEte mail www.laposte.net= =0A>>>> =0A>> =0A>> --1905101558-1711534667-1367385144=:29133--