Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A900FD9E0 for ; Thu, 20 Dec 2012 02:06:31 +0000 (UTC) Received: (qmail 49691 invoked by uid 500); 20 Dec 2012 02:06:29 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 49469 invoked by uid 500); 20 Dec 2012 02:06:29 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 49459 invoked by uid 99); 20 Dec 2012 02:06:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Dec 2012 02:06:29 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [98.139.213.138] (HELO nm18-vm0.bullet.mail.bf1.yahoo.com) (98.139.213.138) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Dec 2012 02:06:24 +0000 Received: from [98.139.215.141] by nm18.bullet.mail.bf1.yahoo.com with NNFMP; 20 Dec 2012 02:06:02 -0000 Received: from [98.139.212.200] by tm12.bullet.mail.bf1.yahoo.com with NNFMP; 20 Dec 2012 02:06:02 -0000 Received: from [127.0.0.1] by omp1009.mail.bf1.yahoo.com with NNFMP; 20 Dec 2012 02:06:02 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 411004.62749.bm@omp1009.mail.bf1.yahoo.com Received: (qmail 12499 invoked by uid 60001); 20 Dec 2012 02:06:02 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1355969162; bh=8cwlkV1LlgoM4NMqGZmJul9ViM4wR+SK0/yFWaSZyHM=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=LJu8O/au7vEiR6mcWaEEPRPs8DCm8K/VeTRlPUbMex0Gnvj3McrXdvI+MqHm/RS+ZdWTu8ZnOBMmkW8dsyXZhJLmftwVXDAzYXxb2x3n29KIYNVGA9tQ6s19mnYHnGjr0W7FMUhDOQ9+pv4wPk3IfjmbuZkoMXbjbEHo6CyGvHo= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=kJyHu+liTq5g1J4zRI5OE11xRkWNy3U3Y1CRNBit9EqOG5+cIBTKmgg6IirMGOCgxvcmdy9dFms+9P0GC6JA+Vvq7CPmZxDEtpkvGha0vsS//ektLjLLQKTa/I05UX86qHDBf7QOWfLphehr6oK6BnJSKYRNW0J+zKisFbqKMi0=; X-YMail-OSG: _8fSnKEVM1nPxLzXJOqh6hs53uL7gvm4tdiB8GDkSc3ltCS 99A9nQop6flb3bPHWFYPuGuj1WVQN1081QM44OQCzzqFAgTk5zmA5heN9MZP ggaIBNYFbk1bvEmNcFbREN_yejcY4TPDRSJSKjea3hNdIzXNZtcTFLmBEHiX H6qCq2njP39vNeIus8uQn1tS3EI_33Iz9Wvq5d651v_XtZp8cdEqKnFRsaUp acEZi2kSgcEWAk48tVvXQGHJ4LQpsPoJ.L3MCDzRLFDdFyj9MpuWfZBMBIkn aQwKYKwzLRi4j0o3WgpxoqUHI92JOSV8T_R0ZLy5S.m5uH54wFvaV2k4mv3. s2QNQkoHrKcye6eE2tm6XDpR.Am9.dqrbSrdzXQi24rMajy4QCux8oHrxx9a 7ln8usIspveUyLjqklBDWojH32UTC7Nbj0m2wlXav7NaegDnZEIN1zv7T4V8 Vuh3wS3wppWlE483XLlkmLMCyVIK6pukH.RO.ch2ga9FXXwZByFAJxVFPCJO BFSyz9Dw2J7lR48ekGYFWbg5pg4zgJRXUBAmoETeus7G07a1af62WDwbG0TB 0U6LDIjBmtIxkWmIkTcWcr_HGNltsSJ5Rc4sReLbb_Sn.nzUuLyZwDpF5vPM QmiUBdK93QJexbhTKMx6ZvKfyyJFprrOe88X77tJGXZRWi_hNwlTw9btLBRF mTkvRCLaqw2vqB4CMHWYEj82PEFZ96EoQ.vsUrmN6jQbiotfdg40IPLrgqPz GUzzvy5x_ZsBzUMwUDrZDTkkfMf3TgZZmmrUpJZhg1B7IOEZ7gQ-- Received: from [107.3.190.75] by web140601.mail.bf1.yahoo.com via HTTP; Wed, 19 Dec 2012 18:06:01 PST X-Rocket-MIMEInfo: 001.001,TWlrZSwgcGxlYXNlIHRoaW5rIGFib3V0IHdoYXQgeW91IHdyaXRlIGJlZm9yZSB5b3Ugd3JpdGUgaXQuCllvdSB3aWxsIG1vc3QgZGVmaW5pdGVseSBub3QgbmVlZCBhIGZ1bGwgdGFibGUgc2NhbiAobXVjaCBsZXNzIGEgKkZVTEwqICpUQUJMRSogKlNDQU4qIDstKSApLgoKUmVhZCBBbGV4J3MgYmxvZyBwb3N0IGFnYWluLCBpdCdzIGEgZ29vZCBwb3N0IChJTUhPKS4gSGUgaXMgdGFsa2luZyBhYm91dCBidWNrZXRzLgoKCi0tIExhcnMKCgoKX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KIEZyb20BMAEBAQE- X-Mailer: YahooMailWebService/0.8.129.483 References: <50D23101.6080701@gmail.com> Message-ID: <1355969161.11822.YahooMailNeo@web140601.mail.bf1.yahoo.com> Date: Wed, 19 Dec 2012 18:06:01 -0800 (PST) From: lars hofhansl Reply-To: lars hofhansl Subject: Re: Is it necessary to set MD5 on rowkey? To: "user@hbase.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="969045052-980462469-1355969161=:11822" X-Virus-Checked: Checked by ClamAV on apache.org --969045052-980462469-1355969161=:11822 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Mike, please think about what you write before you write it.=0AYou will mos= t definitely not need a full table scan (much less a *FULL* *TABLE* *SCAN* = ;-) ).=0A=0ARead Alex's blog post again, it's a good post (IMHO). He is tal= king about buckets.=0A=0A=0A-- Lars=0A=0A=0A=0A____________________________= ____=0A From: Michael Segel =0ATo: user@hbase.ap= ache.org =0ASent: Wednesday, December 19, 2012 5:23 PM=0ASubject: Re: Is it= necessary to set MD5 on rowkey?=0A =0AOk, =0A=0ALets try this one more tim= e... =0A=0AIf you salt, you will have to do a *FULL* *TABLE* *SCAN* in orde= r to retrieve the row. =0AIf you do something like a salt that uses only=A0= a preset of N combinations, you will have to do N get()s in order to fetch= the row. =0A=0AThis is bad. VERY BAD.=0A=0AIf you hash the row, you will g= et a consistent value each time you hash the key.=A0 If you use SHA-1, the = odds of a collision are mathematically possible, however highly improbable.= So people have recommended that they append the key to the hash to form th= e new key. Here, you might as well as truncate the hash to just the most si= gnificant byte or two and the append the key. This will give you enough of = an even distribution that you can avoid hot spotting. =0A=0ASo if I use the= hash, I can effectively still get the row of data back with a single get()= . Otherwise its a full table scan.=0A=0ADo you see the difference? =0A=0A= =0AOn Dec 19, 2012, at 7:11 PM, Jean-Marc Spaggiari wrote:=0A=0A> Hi Mike,=0A> =0A> If in your business case, the only thing= you need when you retreive=0A> your data is to do full scan over MR jobs, = then you can salt with=0A> what-ever you want. Hash, random values, etc.=0A= > =0A> If you know you have x regions, then you can simply do a round-robin= =0A> salting, or a random salting over those x regions.=0A> =0A> Then when = you run your MR job, you discard the first bytes, and do=0A> what you want = with your data.=0A> =0A> So I also think that salting can still be usefull.= All depend on what=0A> you do with your data.=0A> =0A> Must my opinion.=0A= > =0A> JM=0A> =0A> 2012/12/19, Michael Segel := =0A>> Ok...=0A>> =0A>> So you use a random byte or two at the front of the = row.=0A>> How do you then use get() to find the row?=0A>> How do you do a p= artial scan()?=0A>> =0A>> Do you start to see the problem?=0A>> The only wa= y to get to the row is to do a full table scan. That kills HBase=0A>> and y= ou would be better off going with a partitioned Hive table.=0A>> =0A>> Usin= g a hash of the key or a portion of the hash is not a salt.=0A>> That's not= what I have a problem with. Each time you want to fetch the key,=0A>> you = just hash it, truncate the hash and then prepend it to the key. You will=0A= >> then be able to use get().=0A>> =0A>> Using a salt would imply using som= e form of a modulo math to get a round=0A>> robin prefix.=A0 Or a random nu= mber generator.=0A>> =0A>> That's the issue.=0A>> =0A>> Does that make sens= e?=0A>> =0A>> =0A>> =0A>> On Dec 19, 2012, at 3:26 PM, David Arthur wrote:=0A>> =0A>>> Let's say you want to decompose a url into d= omain and path to include in=0A>>> your row key.=0A>>> =0A>>> You could of = course just use the url as the key, but you will see=0A>>> hotspotting sinc= e most will start with "http". To mitigate this, you could=0A>>> add a rand= om byte or two at the beginning (random salt) to improve=0A>>> distribution= of keys, but you break single record Gets (and Scans=0A>>> arguably). Anot= her approach is to use a hash-based salt: hash the whole=0A>>> key and use = a few of those bytes as a salt. This fixes Gets but Scans are=0A>>> still n= ot effective.=0A>>> =0A>>> One approach I've taken is to hash only a part o= f the key. Consider the=0A>>> following key structure=0A>>> =0A>>> <2 bytes= of hash(domain)>=0A>>> =0A>>> With this you get 16 bits for = a hash-based salt. The salt is deterministic=0A>>> so Gets work fine, and f= or a single domain the salt is the same so you can=0A>>> easily do Scans ac= ross a domain. If you had some further structure to your=0A>>> key that you= wished to scan across, you could do something like:=0A>>> =0A>>> <2 bytes = of hash(domain)><2 bytes of hash(path)>=0A>>> =0A>>> It reall= y boils down to identifying your access patterns and read/write=0A>>> requi= rements and constructing a row key accordingly.=0A>>> =0A>>> HTH,=0A>>> Dav= id=0A>>> =0A>>> On 12/18/12 6:29 PM, Michael Segel wrote:=0A>>>> Alex,=0A>>= >> And that's the point. Salt as you explain it conceptually implies that= =0A>>>> the number you are adding to the key to ensure a better distributio= n=0A>>>> means that you will have inefficiencies in terms of scans and gets= .=0A>>>> =0A>>>> Using a hash as either the full key, or taking the hash, t= runcating it=0A>>>> and appending the key may screw up scans, but your get(= ) is intact.=0A>>>> =0A>>>> There are other options like inverting the nume= ric key ...=0A>>>> =0A>>>> And of course doing nothing.=0A>>>> =0A>>>> Usin= g a salt as part of the design pattern is bad.=0A>>>> =0A>>>> With respect = to the OP, I was discussing the use of hash and some=0A>>>> alternatives to= how to implement the hash of a key.=0A>>>> Again, doing nothing may also m= ake sense too, if you understand the risks=0A>>>> and you know how your dat= a is going to be used.=0A>>>> =0A>>>> =0A>>>> On Dec 18, 2012, at 11:36 AM,= Alex Baranau =0A>>>> wrote:=0A>>>> =0A>>>>> Mike= ,=0A>>>>> =0A>>>>> Please read *full post* before judge. In particular, "Ha= sh-based=0A>>>>> distribution" section. You can find the same in HBaseWD sm= all README=0A>>>>> file=0A>>>>> [1] (not sure if you read it at all before = commenting on the lib).=0A>>>>> Round=0A>>>>> robin is mainly for explainin= g the concept/idea (though not only for=0A>>>>> that).=0A>>>>> =0A>>>>> Tha= nk you,=0A>>>>> Alex Baranau=0A>>>>> ------=0A>>>>> Sematext :: http://blog= .sematext.com/ :: Hadoop - HBase - ElasticSearch=0A>>>>> -=0A>>>>> Solr=0A>= >>>> =0A>>>>> [1] https://github.com/sematext/HBaseWD=0A>>>>> =0A>>>>> On T= ue, Dec 18, 2012 at 12:24 PM, Michael Segel=0A>>>>> wrote:=0A>>>>> =0A>>>>>> Quick answer...=0A>>>>>> =0A>>>>>> Look at the= salt.=0A>>>>>> Its just a number from a round robin counter.=0A>>>>>> Ther= e is no tie between the salt and row.=0A>>>>>> =0A>>>>>> So when you want t= o fetch a single row, how do you do it?=0A>>>>>> ...=0A>>>>>> ;-)=0A>>>>>> = =0A>>>>>> On Dec 18, 2012, at 11:12 AM, Alex Baranau =0A>>>>>> wrote:=0A>>>>>> =0A>>>>>>> Hello,=0A>>>>>>> =0A>>>>>>> @Mike:= =0A>>>>>>> =0A>>>>>>> I'm the author of that post :).=0A>>>>>>> =0A>>>>>>> = Quick reply to your last comment:=0A>>>>>>> =0A>>>>>>> 1) Could you please = describe why "the use of a 'Salt' is a very, very=0A>>>>>>> bad=0A>>>>>>> i= dea" in more specific way than "Fetching data takes more effort".=0A>>>>>>>= Would=0A>>>>>> be=0A>>>>>>> helpful for anyone who is looking into using t= his approach.=0A>>>>>>> =0A>>>>>>> 2) The approach described in the post al= so says you can prefix with=0A>>>>>>> the=0A>>>>>>> hash, you probably miss= ed that.=0A>>>>>>> =0A>>>>>>> 3) I believe your answer, "use MD5 or SHA-1" = doesn't help bigdata=0A>>>>>>> guy.=0A>>>>>>> Please re-read the question: = the intention is to distribute the load=0A>>>>>> while=0A>>>>>>> still bein= g able to do "partial key scans". The blog post linked=0A>>>>>>> above=0A>>= >>>>> explains one possible solution for that, while your answer doesn't.= =0A>>>>>>> =0A>>>>>>> @bigdata:=0A>>>>>>> =0A>>>>>>> Basically when it come= s to solving two issues: distributing writes=0A>>>>>>> and=0A>>>>>>> having= ability to read data sequentially, you have to balance between=0A>>>>>> be= ing=0A>>>>>>> good at both of them. Very good presentation by Lars:=0A>>>>>= >> =0A>>>>>> http://www.slideshare.net/larsgeorge/hbase-advanced-schema-des= ign-berlin-buzzwords-june-2012=0A>>>>>> ,=0A>>>>>>> slide 22. You will see = how this is correlated. In short:=0A>>>>>>> * having md5/other hash prefix = of the key does better w.r.t.=0A>>>>>>> distributing=0A>>>>>>> writes, whil= e compromises ability to do range scans efficiently=0A>>>>>>> * having very= limited number of 'salt' prefixes still allows to do=0A>>>>>>> range=0A>>>= >>>> scans (less efficiently than normal range scans, of course, but still= =0A>>>>>> good=0A>>>>>>> enough in many cases) while providing worse distri= bution of writes=0A>>>>>>> =0A>>>>>>> In the latter case by choosing number= of possible 'salt' prefixes=0A>>>>>>> (which=0A>>>>>>> could be derived fr= om hashed values, etc.) you can balance between=0A>>>>>>> distributing writ= es efficiency and ability to run fast range scans.=0A>>>>>>> =0A>>>>>>> Hop= e this helps=0A>>>>>>> =0A>>>>>>> Alex Baranau=0A>>>>>>> ------=0A>>>>>>> S= ematext :: http://blog.sematext.com/ :: Hadoop - HBase -=0A>>>>>>> ElasticS= earch=0A>>>>>> -=0A>>>>>>> Solr=0A>>>>>>> =0A>>>>>>> On Tue, Dec 18, 2012 a= t 8:52 AM, Michael Segel <=0A>>>>>> michael_segel@hotmail.com>wrote:=0A>>>>= >>>> Hi,=0A>>>>>>>> =0A>>>>>>>> First, the use of a 'Salt' is a very, very = bad idea and I would=0A>>>>>>>> really=0A>>>>>>>> hope that the author of t= hat blog take it down.=0A>>>>>>>> While it may solve an initial problem in = terms of region hot=0A>>>>>>>> spotting,=0A>>>>>> it=0A>>>>>>>> creates ano= ther problem when it comes to fetching data. Fetching=0A>>>>>>>> data=0A>>>= >>> takes=0A>>>>>>>> more effort.=0A>>>>>>>> =0A>>>>>>>> With respect to us= ing a hash (MD5 or SHA-1) you are creating a more=0A>>>>>> random=0A>>>>>>>= > key that is unique to the record.=A0 Some would argue that using MD5=0A>>= >>>>>> or=0A>>>>>> SHA-1=0A>>>>>>>> that mathematically you could have a co= llision, however you could=0A>>>>>>>> then=0A>>>>>>>> append the key to the= hash to guarantee uniqueness. You could also=0A>>>>>>>> do=0A>>>>>>>> thin= gs like take the hash and then truncate it to the first byte and=0A>>>>>> t= hen=0A>>>>>>>> append the record key. This should give you enough randomnes= s to=0A>>>>>>>> avoid=0A>>>>>> hot=0A>>>>>>>> spotting after the initial re= gion completion and you could pre-split=0A>>>>>>>> out=0A>>>>>>>> any numbe= r of regions. (First byte 0-255 for values, so you can=0A>>>>>>>> program= =0A>>>>>> the=0A>>>>>>>> split...=0A>>>>>>>> =0A>>>>>>>> =0A>>>>>>>> Having= said that... yes, you lose the ability to perform a=0A>>>>>>>> sequential= =0A>>>>>> scan=0A>>>>>>>> of the data.=A0 At least to a point.=A0 It depend= s on your schema.=0A>>>>>>>> =0A>>>>>>>> Note that you need to think about = how you are primarily going to=0A>>>>>>>> access=0A>>>>>>>> the data.=A0 Yo= u can then determine the best way to store the data to=0A>>>>>>>> gain=0A>>= >>>>>> the best performance. For some applications... the region hot=0A>>>>= >>>> spotting=0A>>>>>>>> isn't an important issue.=0A>>>>>>>> =0A>>>>>>>> N= ote YMMV=0A>>>>>>>> =0A>>>>>>>> HTH=0A>>>>>>>> =0A>>>>>>>> -Mike=0A>>>>>>>>= =0A>>>>>>>> On Dec 18, 2012, at 3:33 AM, Damien Hardy =0A>>>>>> wrote:=0A>>>>>>>>> Hello,=0A>>>>>>>>> =0A>>>>>>>>> There is mi= ddle term betwen sequecial keys (hot spoting risk) and=0A>>>>>>>>> md5=0A>>= >>>>>>> (heavy scan):=0A>>>>>>>>> * you can use composed keys with a field = that can segregate data=0A>>>>>>>>> (hostname, productname, metric name) li= ke OpenTSDB=0A>>>>>>>>> * or use Salt with a limited number of values (exam= ple=0A>>>>>>>>> substr(md5(rowid),0,1) =3D 16 values)=0A>>>>>>>>> so that a= scan is a combination of 16 filters on on each salt=0A>>>>>>>>> values=0A>= >>>>>>>> you can base your code on HBaseWD by sematext=0A>>>>>>>>> =0A>>>>>= >>>> =0A>>>>>> http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserv= er-hotspotting-despite-writing-records-with-sequential-keys/=0A>>>>>>>>>=A0= =A0 https://github.com/sematext/HBaseWD=0A>>>>>>>>> =0A>>>>>>>>> Cheers,= =0A>>>>>>>>> =0A>>>>>>>>> =0A>>>>>>>>> 2012/12/18 bigdata =0A>>>>>>>>> =0A>>>>>>>>>> Many articles tell me that MD5 rowkey or= part of it is good method=0A>>>>>>>>>> to=0A>>>>>>>>>> balance the records= stored in different parts. But If I want to=0A>>>>>>>>>> search=0A>>>>>>>>= some=0A>>>>>>>>>> sequential rowkey records, such as date as rowkey or par= tially. I=0A>>>>>>>>>> can=0A>>>>>>>> not=0A>>>>>>>>>> use rowkey filter to= scan a range of date value one time on the=0A>>>>>>>>>> date=0A>>>>>> by= =0A>>>>>>>>>> MD5. How to balance this issue?=0A>>>>>>>>>> Thanks.=0A>>>>>>= >>>> =0A>>>>>>>>>> =0A>>>>>>>>> =0A>>>>>>>>> =0A>>>>>>>>> =0A>>>>>>>>> --= =0A>>>>>>>>> Damien HARDY=0A>>>>>>>> =0A>>>>>> =0A>>> =0A>>> =0A>> =0A>> = =0A> --969045052-980462469-1355969161=:11822--