Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (athena.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
  s=s1024; d=yahoo.com;
  h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type;
  b=kJyHu+liTq5g1J4zRI5OE11xRkWNy3U3Y1CRNBit9EqOG5+cIBTKmgg6IirMGOCgxvcmdy9dFms+9P0GC6JA+Vvq7CPmZxDEtpkvGha0vsS//ektLjLLQKTa/I05UX86qHDBf7QOWfLphehr6oK6BnJSKYRNW0J+zKisFbqKMi0=;
References: <SNT002-W108E99E544990EECA76F137B5310@phx.gbl>
 <CADK=YxsumgHC07J+C1Z5s0isg7O17f8t6rJPbADjHFdSv5T_TA@mail.gmail.com>
 <BLU0-SMTP404EC8B27950638AA25D47A8F310@phx.gbl>
 <CAA7+SiAzR13bU_U-EeWqLhrX36-A8cUKuB59n9cg=68HM9EWmw@mail.gmail.com>
 <BLU0-SMTP100AA4A54A352F4E5C065958F310@phx.gbl>
 <CAA7+SiCuphmga_NS0Ujt+xiycRqeSC7TCYJ=mdQgt6QfGOAxQg@mail.gmail.com>
 <BLU0-SMTP93F19E706CC0ABE2D97068F310@phx.gbl> <50D23101.6080701@gmail.com>
 <BLU0-SMTP469B52113A309DF6AA651AD8F370@phx.gbl>
 <CAPQV63XB12UE9z=ui6GHc3H-teEd65+2N9SNSSvKYsHH-ovdFw@mail.gmail.com>
 <BLU0-SMTP100DAC0A0E06B063F7CF1948F370@phx.gbl>
Message-ID: <1355969161.11822.YahooMailNeo@web140601.mail.bf1.yahoo.com>
Date: Wed, 19 Dec 2012 18:06:01 -0800 (PST)
From: lars hofhansl <lhofhansl@yahoo.com>
Reply-To: lars hofhansl <lhofhansl@yahoo.com>
Subject: Re: Is it necessary to set MD5 on rowkey?
To: "user@hbase.apache.org" <user@hbase.apache.org>
In-Reply-To: <BLU0-SMTP100DAC0A0E06B063F7CF1948F370@phx.gbl>
MIME-Version: 1.0
Content-Type: multipart/alternative;
 boundary="969045052-980462469-1355969161=:11822"

--969045052-980462469-1355969161=:11822
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable

Mike, please think about what you write before you write it.=0AYou will mos=
t definitely not need a full table scan (much less a *FULL* *TABLE* *SCAN* =
;-) ).=0A=0ARead Alex's blog post again, it's a good post (IMHO). He is tal=
king about buckets.=0A=0A=0A-- Lars=0A=0A=0A=0A____________________________=
____=0A From: Michael Segel <michael_segel@hotmail.com>=0ATo: user@hbase.ap=
ache.org =0ASent: Wednesday, December 19, 2012 5:23 PM=0ASubject: Re: Is it=
 necessary to set MD5 on rowkey?=0A =0AOk, =0A=0ALets try this one more tim=
e... =0A=0AIf you salt, you will have to do a *FULL* *TABLE* *SCAN* in orde=
r to retrieve the row. =0AIf you do something like a salt that uses only=A0=
 a preset of N combinations, you will have to do N get()s in order to fetch=
 the row. =0A=0AThis is bad. VERY BAD.=0A=0AIf you hash the row, you will g=
et a consistent value each time you hash the key.=A0 If you use SHA-1, the =
odds of a collision are mathematically possible, however highly improbable.=
 So people have recommended that they append the key to the hash to form th=
e new key. Here, you might as well as truncate the hash to just the most si=
gnificant byte or two and the append the key. This will give you enough of =
an even distribution that you can avoid hot spotting. =0A=0ASo if I use the=
 hash, I can effectively still get the row of data back with a single get()=
. Otherwise its a full table scan.=0A=0ADo you see the difference? =0A=0A=
=0AOn Dec 19, 2012, at 7:11 PM, Jean-Marc Spaggiari <jean-marc@spaggiari.or=
g> wrote:=0A=0A> Hi Mike,=0A> =0A> If in your business case, the only thing=
 you need when you retreive=0A> your data is to do full scan over MR jobs, =
then you can salt with=0A> what-ever you want. Hash, random values, etc.=0A=
> =0A> If you know you have x regions, then you can simply do a round-robin=
=0A> salting, or a random salting over those x regions.=0A> =0A> Then when =
you run your MR job, you discard the first bytes, and do=0A> what you want =
with your data.=0A> =0A> So I also think that salting can still be usefull.=
 All depend on what=0A> you do with your data.=0A> =0A> Must my opinion.=0A=
> =0A> JM=0A> =0A> 2012/12/19, Michael Segel <michael_segel@hotmail.com>:=
=0A>> Ok...=0A>> =0A>> So you use a random byte or two at the front of the =
row.=0A>> How do you then use get() to find the row?=0A>> How do you do a p=
artial scan()?=0A>> =0A>> Do you start to see the problem?=0A>> The only wa=
y to get to the row is to do a full table scan. That kills HBase=0A>> and y=
ou would be better off going with a partitioned Hive table.=0A>> =0A>> Usin=
g a hash of the key or a portion of the hash is not a salt.=0A>> That's not=
 what I have a problem with. Each time you want to fetch the key,=0A>> you =
just hash it, truncate the hash and then prepend it to the key. You will=0A=
>> then be able to use get().=0A>> =0A>> Using a salt would imply using som=
e form of a modulo math to get a round=0A>> robin prefix.=A0 Or a random nu=
mber generator.=0A>> =0A>> That's the issue.=0A>> =0A>> Does that make sens=
e?=0A>> =0A>> =0A>> =0A>> On Dec 19, 2012, at 3:26 PM, David Arthur <mumrah=
@gmail.com> wrote:=0A>> =0A>>> Let's say you want to decompose a url into d=
omain and path to include in=0A>>> your row key.=0A>>> =0A>>> You could of =
course just use the url as the key, but you will see=0A>>> hotspotting sinc=
e most will start with "http". To mitigate this, you could=0A>>> add a rand=
om byte or two at the beginning (random salt) to improve=0A>>> distribution=
 of keys, but you break single record Gets (and Scans=0A>>> arguably). Anot=
her approach is to use a hash-based salt: hash the whole=0A>>> key and use =
a few of those bytes as a salt. This fixes Gets but Scans are=0A>>> still n=
ot effective.=0A>>> =0A>>> One approach I've taken is to hash only a part o=
f the key. Consider the=0A>>> following key structure=0A>>> =0A>>> <2 bytes=
 of hash(domain)><domain><path>=0A>>> =0A>>> With this you get 16 bits for =
a hash-based salt. The salt is deterministic=0A>>> so Gets work fine, and f=
or a single domain the salt is the same so you can=0A>>> easily do Scans ac=
ross a domain. If you had some further structure to your=0A>>> key that you=
 wished to scan across, you could do something like:=0A>>> =0A>>> <2 bytes =
of hash(domain)><domain><2 bytes of hash(path)><path>=0A>>> =0A>>> It reall=
y boils down to identifying your access patterns and read/write=0A>>> requi=
rements and constructing a row key accordingly.=0A>>> =0A>>> HTH,=0A>>> Dav=
id=0A>>> =0A>>> On 12/18/12 6:29 PM, Michael Segel wrote:=0A>>>> Alex,=0A>>=
>> And that's the point. Salt as you explain it conceptually implies that=
=0A>>>> the number you are adding to the key to ensure a better distributio=
n=0A>>>> means that you will have inefficiencies in terms of scans and gets=
.=0A>>>> =0A>>>> Using a hash as either the full key, or taking the hash, t=
runcating it=0A>>>> and appending the key may screw up scans, but your get(=
) is intact.=0A>>>> =0A>>>> There are other options like inverting the nume=
ric key ...=0A>>>> =0A>>>> And of course doing nothing.=0A>>>> =0A>>>> Usin=
g a salt as part of the design pattern is bad.=0A>>>> =0A>>>> With respect =
to the OP, I was discussing the use of hash and some=0A>>>> alternatives to=
 how to implement the hash of a key.=0A>>>> Again, doing nothing may also m=
ake sense too, if you understand the risks=0A>>>> and you know how your dat=
a is going to be used.=0A>>>> =0A>>>> =0A>>>> On Dec 18, 2012, at 11:36 AM,=
 Alex Baranau <alex.baranov.v@gmail.com>=0A>>>> wrote:=0A>>>> =0A>>>>> Mike=
,=0A>>>>> =0A>>>>> Please read *full post* before judge. In particular, "Ha=
sh-based=0A>>>>> distribution" section. You can find the same in HBaseWD sm=
all README=0A>>>>> file=0A>>>>> [1] (not sure if you read it at all before =
commenting on the lib).=0A>>>>> Round=0A>>>>> robin is mainly for explainin=
g the concept/idea (though not only for=0A>>>>> that).=0A>>>>> =0A>>>>> Tha=
nk you,=0A>>>>> Alex Baranau=0A>>>>> ------=0A>>>>> Sematext :: http://blog=
.sematext.com/ :: Hadoop - HBase - ElasticSearch=0A>>>>> -=0A>>>>> Solr=0A>=
>>>> =0A>>>>> [1] https://github.com/sematext/HBaseWD=0A>>>>> =0A>>>>> On T=
ue, Dec 18, 2012 at 12:24 PM, Michael Segel=0A>>>>> <michael_segel@hotmail.=
com>wrote:=0A>>>>> =0A>>>>>> Quick answer...=0A>>>>>> =0A>>>>>> Look at the=
 salt.=0A>>>>>> Its just a number from a round robin counter.=0A>>>>>> Ther=
e is no tie between the salt and row.=0A>>>>>> =0A>>>>>> So when you want t=
o fetch a single row, how do you do it?=0A>>>>>> ...=0A>>>>>> ;-)=0A>>>>>> =
=0A>>>>>> On Dec 18, 2012, at 11:12 AM, Alex Baranau <alex.baranov.v@gmail.=
com>=0A>>>>>> wrote:=0A>>>>>> =0A>>>>>>> Hello,=0A>>>>>>> =0A>>>>>>> @Mike:=
=0A>>>>>>> =0A>>>>>>> I'm the author of that post :).=0A>>>>>>> =0A>>>>>>> =
Quick reply to your last comment:=0A>>>>>>> =0A>>>>>>> 1) Could you please =
describe why "the use of a 'Salt' is a very, very=0A>>>>>>> bad=0A>>>>>>> i=
dea" in more specific way than "Fetching data takes more effort".=0A>>>>>>>=
 Would=0A>>>>>> be=0A>>>>>>> helpful for anyone who is looking into using t=
his approach.=0A>>>>>>> =0A>>>>>>> 2) The approach described in the post al=
so says you can prefix with=0A>>>>>>> the=0A>>>>>>> hash, you probably miss=
ed that.=0A>>>>>>> =0A>>>>>>> 3) I believe your answer, "use MD5 or SHA-1" =
doesn't help bigdata=0A>>>>>>> guy.=0A>>>>>>> Please re-read the question: =
the intention is to distribute the load=0A>>>>>> while=0A>>>>>>> still bein=
g able to do "partial key scans". The blog post linked=0A>>>>>>> above=0A>>=
>>>>> explains one possible solution for that, while your answer doesn't.=
=0A>>>>>>> =0A>>>>>>> @bigdata:=0A>>>>>>> =0A>>>>>>> Basically when it come=
s to solving two issues: distributing writes=0A>>>>>>> and=0A>>>>>>> having=
 ability to read data sequentially, you have to balance between=0A>>>>>> be=
ing=0A>>>>>>> good at both of them. Very good presentation by Lars:=0A>>>>>=
>> =0A>>>>>> http://www.slideshare.net/larsgeorge/hbase-advanced-schema-des=
ign-berlin-buzzwords-june-2012=0A>>>>>> ,=0A>>>>>>> slide 22. You will see =
how this is correlated. In short:=0A>>>>>>> * having md5/other hash prefix =
of the key does better w.r.t.=0A>>>>>>> distributing=0A>>>>>>> writes, whil=
e compromises ability to do range scans efficiently=0A>>>>>>> * having very=
 limited number of 'salt' prefixes still allows to do=0A>>>>>>> range=0A>>>=
>>>> scans (less efficiently than normal range scans, of course, but still=
=0A>>>>>> good=0A>>>>>>> enough in many cases) while providing worse distri=
bution of writes=0A>>>>>>> =0A>>>>>>> In the latter case by choosing number=
 of possible 'salt' prefixes=0A>>>>>>> (which=0A>>>>>>> could be derived fr=
om hashed values, etc.) you can balance between=0A>>>>>>> distributing writ=
es efficiency and ability to run fast range scans.=0A>>>>>>> =0A>>>>>>> Hop=
e this helps=0A>>>>>>> =0A>>>>>>> Alex Baranau=0A>>>>>>> ------=0A>>>>>>> S=
ematext :: http://blog.sematext.com/ :: Hadoop - HBase -=0A>>>>>>> ElasticS=
earch=0A>>>>>> -=0A>>>>>>> Solr=0A>>>>>>> =0A>>>>>>> On Tue, Dec 18, 2012 a=
t 8:52 AM, Michael Segel <=0A>>>>>> michael_segel@hotmail.com>wrote:=0A>>>>=
>>>> Hi,=0A>>>>>>>> =0A>>>>>>>> First, the use of a 'Salt' is a very, very =
bad idea and I would=0A>>>>>>>> really=0A>>>>>>>> hope that the author of t=
hat blog take it down.=0A>>>>>>>> While it may solve an initial problem in =
terms of region hot=0A>>>>>>>> spotting,=0A>>>>>> it=0A>>>>>>>> creates ano=
ther problem when it comes to fetching data. Fetching=0A>>>>>>>> data=0A>>>=
>>> takes=0A>>>>>>>> more effort.=0A>>>>>>>> =0A>>>>>>>> With respect to us=
ing a hash (MD5 or SHA-1) you are creating a more=0A>>>>>> random=0A>>>>>>>=
> key that is unique to the record.=A0 Some would argue that using MD5=0A>>=
>>>>>> or=0A>>>>>> SHA-1=0A>>>>>>>> that mathematically you could have a co=
llision, however you could=0A>>>>>>>> then=0A>>>>>>>> append the key to the=
 hash to guarantee uniqueness. You could also=0A>>>>>>>> do=0A>>>>>>>> thin=
gs like take the hash and then truncate it to the first byte and=0A>>>>>> t=
hen=0A>>>>>>>> append the record key. This should give you enough randomnes=
s to=0A>>>>>>>> avoid=0A>>>>>> hot=0A>>>>>>>> spotting after the initial re=
gion completion and you could pre-split=0A>>>>>>>> out=0A>>>>>>>> any numbe=
r of regions. (First byte 0-255 for values, so you can=0A>>>>>>>> program=
=0A>>>>>> the=0A>>>>>>>> split...=0A>>>>>>>> =0A>>>>>>>> =0A>>>>>>>> Having=
 said that... yes, you lose the ability to perform a=0A>>>>>>>> sequential=
=0A>>>>>> scan=0A>>>>>>>> of the data.=A0 At least to a point.=A0 It depend=
s on your schema.=0A>>>>>>>> =0A>>>>>>>> Note that you need to think about =
how you are primarily going to=0A>>>>>>>> access=0A>>>>>>>> the data.=A0 Yo=
u can then determine the best way to store the data to=0A>>>>>>>> gain=0A>>=
>>>>>> the best performance. For some applications... the region hot=0A>>>>=
>>>> spotting=0A>>>>>>>> isn't an important issue.=0A>>>>>>>> =0A>>>>>>>> N=
ote YMMV=0A>>>>>>>> =0A>>>>>>>> HTH=0A>>>>>>>> =0A>>>>>>>> -Mike=0A>>>>>>>>=
 =0A>>>>>>>> On Dec 18, 2012, at 3:33 AM, Damien Hardy <dhardy@viadeoteam.c=
om>=0A>>>>>> wrote:=0A>>>>>>>>> Hello,=0A>>>>>>>>> =0A>>>>>>>>> There is mi=
ddle term betwen sequecial keys (hot spoting risk) and=0A>>>>>>>>> md5=0A>>=
>>>>>>> (heavy scan):=0A>>>>>>>>> * you can use composed keys with a field =
that can segregate data=0A>>>>>>>>> (hostname, productname, metric name) li=
ke OpenTSDB=0A>>>>>>>>> * or use Salt with a limited number of values (exam=
ple=0A>>>>>>>>> substr(md5(rowid),0,1) =3D 16 values)=0A>>>>>>>>> so that a=
 scan is a combination of 16 filters on on each salt=0A>>>>>>>>> values=0A>=
>>>>>>>> you can base your code on HBaseWD by sematext=0A>>>>>>>>> =0A>>>>>=
>>>> =0A>>>>>> http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserv=
er-hotspotting-despite-writing-records-with-sequential-keys/=0A>>>>>>>>>=A0=
 =A0 https://github.com/sematext/HBaseWD=0A>>>>>>>>> =0A>>>>>>>>> Cheers,=
=0A>>>>>>>>> =0A>>>>>>>>> =0A>>>>>>>>> 2012/12/18 bigdata <bigdatabase@outl=
ook.com>=0A>>>>>>>>> =0A>>>>>>>>>> Many articles tell me that MD5 rowkey or=
 part of it is good method=0A>>>>>>>>>> to=0A>>>>>>>>>> balance the records=
 stored in different parts. But If I want to=0A>>>>>>>>>> search=0A>>>>>>>>=
 some=0A>>>>>>>>>> sequential rowkey records, such as date as rowkey or par=
tially. I=0A>>>>>>>>>> can=0A>>>>>>>> not=0A>>>>>>>>>> use rowkey filter to=
 scan a range of date value one time on the=0A>>>>>>>>>> date=0A>>>>>> by=
=0A>>>>>>>>>> MD5. How to balance this issue?=0A>>>>>>>>>> Thanks.=0A>>>>>>=
>>>> =0A>>>>>>>>>> =0A>>>>>>>>> =0A>>>>>>>>> =0A>>>>>>>>> =0A>>>>>>>>> --=
=0A>>>>>>>>> Damien HARDY=0A>>>>>>>> =0A>>>>>> =0A>>> =0A>>> =0A>> =0A>> =
=0A> 
--969045052-980462469-1355969161=:11822--