hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Talat Uyarer <ta...@uyarer.com>
Subject Re: Regions and Rowkeys
Date Tue, 12 May 2015 15:32:54 GMT
Hi Arun,

Please read this document http://hbase.apache.org/book.html#rowkey.design i
think it will be help figure out to rowkey design.

Talat
On May 12, 2015 6:00 PM, "Arun Patel" <arunp.bigdata@gmail.com> wrote:

> Thank you all.  This info is really helpful.    I have a follow up question
> related to my use case.
>
> I need to create a table called as LOGS to log event info generated from
> multiple services that I am calling.
>
> For each rowkey (a random UUID generated in this case), multiple services
> are called and success or failure status is logged into LOGS table.
> So, My data is something like this...
>
> Rowkey       Filename Service Message
> 312sdasd31244  file1 service1   success
> 312sdasd31244  file1 service2   success
> 312sdasd31244  file2 service1   failure:   Reason for failure
> ....
> ..
> .
> 789sdfsf34234  file1 service1   success
> 789sdfsf34234  file1 service2   success
> 789sdfsf34234  file2 service3   failure:   Reason for failure
> ...
> ..
> .
>
>
> This log info will be accessed by a polling service to track the progress
> of Rowkey using REST API.
>
> So, Basically polling service will do GET on rowkey with filter on filename
> something like this...
>
> get 'LOGS', '312sdasd31244', {FILTER => "ColumnPrefixFilter('file1')"}
>
> So, My question is how the rowkey to be designed?  Salting may not help
> because the access pattern is not random. I will be scanning a range of
> rows.
>
> What are the other factors I need to consider to make this really effective
> for this use case?
>
> Regards,
> Arun
>
>
> On Tue, May 12, 2015 at 9:58 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > Arun:
> > See the following for details:
> >
> > http://hbase.apache.org/book.html#_determining_split_points
> >
> >
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/RegionSplitter.HexStringSplit.html
> >
> > Cheers
> >
> > On Tue, May 12, 2015 at 6:11 AM, Talat Uyarer <talat@uyarer.com> wrote:
> >
> > > Hi Arun,
> > >
> > > rowKeys. Hbase decide which data is stored which region by rowkeys.
> > > the RegionSplitter uses MD5 algorithm to generate region starting keys
> > > of MD5 checksum.
> > >
> > > Talat
> > >
> > >
> > >
> > > 2015-05-12 15:48 GMT+03:00 Arun Patel <arunp.bigdata@gmail.com>:
> > > > Thank you.  This helps.
> > > >
> > > > So, when I pre-split regions with below command, SPLITALGO is
> creating
> > > the
> > > > rowkey boundaries for each region?
> > > >
> > > > create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}
> > > >
> > > > I am failing to understand HexStringSplit.  As per documentation,The
> > > format
> > > > of a HexStringSplit region boundary is the ASCII representation of an
> > MD5
> > > > checksum, or any other uniformly distributed hexadecimal value.
> > > >
> > > > My Question is MD5 Checksum of what?
> > > >
> > > > Regards,
> > > > Arun
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, May 11, 2015 at 8:57 PM, Nick Dimiduk <ndimiduk@gmail.com>
> > > wrote:
> > > >
> > > >> On Mon, May 11, 2015 at 3:38 PM, Arun Patel <
> arunp.bigdata@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > 1) I have a 10 node HBase cluster.  When I create a table in
> HBase,
> > > >> > how many regions will be allocated by default?
> > > >>
> > > >>
> > > >> In HBase, the number of region servers is orthogonal to table
> > > partitions.
> > > >> These two operational details are related but managed independently.
> > > >>
> > > >> I looked at the HBase Master UIand it seems regions are not
> allocated
> > to
> > > >> > all the Regionservers by
> > > >> > default.  How can I allocate the regions in all Region Servers?
> > > >>
> > > >>
> > > >> HBase will evenly balance the regions of all tables it's hosting
> > across
> > > all
> > > >> region servers in the cluster. If you have fewer regions than region
> > > >> servers, some servers will have no regions to host.
> > > >>
> > > >> Basically, This distributes the data in a better way If I am using
a
> > > slated
> > > >> > key. My requirement is to distribute the data across the cluster
> > using
> > > >> > salted keys.  But, Having few regions is a constraint?
> > > >> >
> > > >>
> > > >> You're moving in the right direction. The next step would be to
> split
> > > your
> > > >> table according to some prefix value, presumably related to your
> > > "salting"
> > > >> choice. This will depend on what value you're prepending to the row
> > keys
> > > >> and the cardinality of those values. Apache Phoenix does this, for
> > > example,
> > > >> with a fixed byte prefix and an one pre-split per salt-byte value
> > > (i.e., 0,
> > > >> 1, 2, 3, ... 15).
> > > >>
> > > >> 2) How does the rowkey to region mapping works?  In Cassandra, we
> > have a
> > > >> > concept of assigning token range for each node.  Rowkey will
be
> > > assigned
> > > >> to
> > > >> > a node based on the token range.  How does this work in HBase?
> > > >>
> > > >>
> > > >> HBase is ordered and range-partitioned. Basically, your row keys are
> > > sorted
> > > >> and region boundaries are determined at points within that range.
So
> > if
> > > you
> > > >> have rows 'a' - 'z', HBase will define regions as contiguous
> segments
> > of
> > > >> this range, 'a' - 'f', and 'g' - 'k' for example. The range of a
> > region
> > > is
> > > >> dictated primarily by the amount of data contained therein. When a
> > > region
> > > >> becomes too big, it will be split in half and two child regions are
> > > created
> > > >> (i.e., 'a' - 'f' becomes 'a' - 'c' and 'd' - 'f'). Once a region
> > splits,
> > > >> the children are independent and can be moved to other region
> servers.
> > > >>
> > > >> I explain a bit of this and more in my talk "HBase for Architects".
> I
> > > link
> > > >> to a video from my blog [0]. As Michael mentioned, there's more
> detail
> > > >> published in both our book [1], as well as our other books [2], [3].
> > > >>
> > > >> Welcome to HBase ;)
> > > >> -n
> > > >>
> > > >> [0]: http://www.n10k.com/blog/hbase-for-architects-redux/
> > > >> [1]: https://hbase.apache.org/book.html#regions.arch
> > > >> [2]: http://www.manning.com/dimidukkhurana/
> > > >> [3]: http://shop.oreilly.com/product/0636920033943.do
> > > >>
> > >
> > >
> > >
> > > --
> > > Talat UYARER
> > > Websitesi: http://talat.uyarer.com
> > > Twitter: http://twitter.com/talatuyarer
> > > Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message