hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <shahab.yu...@gmail.com>
Subject Re: Read access pattern
Date Tue, 30 Apr 2013 16:17:54 GMT
Well those are *some* words :) Anyway, can you explain a bit in detail that
why you feel so strongly about this design/approach? The salting here is
not the only option mentioned and static hashing can be used as well. Plus
even in case of salting, wouldn't the distributed scan take care of it? The
downside that I see, is the bucket_number that we have to maintain both at
time or reading/writing and update it in case of cluster restructuring.

Thanks,
Shahab


On Tue, Apr 30, 2013 at 11:57 AM, Michael Segel
<michael_segel@hotmail.com>wrote:

> Geez that's a bad article.
> Never salt.
>
> And yes there's a difference between using a salt and using the first 2-4
> bytes from your MD5 hash.
>
> (Hint: Salts are random. Your hash isn't. )
>
> Sorry to be-itch but its a bad idea and it shouldn't be propagated.
>
> On Apr 29, 2013, at 10:17 AM, Shahab Yunus <shahab.yunus@gmail.com> wrote:
>
> > I think you cannot use the scanner simply to to a range scan here as your
> > keys are not monotonically increasing. You need to apply logic to
> > decode/reverse your mechanism that you have used to hash your keys at the
> > time of writing. You might want to check out the SemaText library which
> > does distributed scans and seem to handle the scenarios that you want to
> > implement.
> >
> http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/
> >
> >
> > On Mon, Apr 29, 2013 at 11:03 AM, <ricla@laposte.net> wrote:
> >
> >> Hi,
> >>
> >> I have a rowkey defined by :
> >>        getMD5AsHex(Bytes.toBytes(myObjectId)) + String.format("%19d\n",
> >> (Long.MAX_VALUE - changeDate.getTime()));
> >>
> >> How could I get the previous and next row for a given rowkey ?
> >> For instance, I have the following ordered keys :
> >>
> >> 00003db1b6c1e7e7d2ece41ff2184f76*9223370673172227807
> >> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674468022807
> >>> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674468862807
> >> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674984237807
> >> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674987271807
> >>
> >> If I choose the rowkey :
> >> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674468862807, what would be the
> >> correct scan to get the previous and next key ?
> >> Result would be :
> >> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674468022807
> >> 00003db1b6c1e7e7d2ece41ff2184f76*9223370674984237807
> >>
> >> Thank you !
> >> R.
> >>
> >> Une messagerie gratuite, garantie à vie et des services en plus, ça vous
> >> tente ?
> >> Je crée ma boîte mail www.laposte.net
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message