hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Query regarding pre-split Major compaction
Date Sat, 10 Sep 2016 01:34:44 GMT
Given ingestion rate of 7GB / hour, you would have ~ 5000GB data per month.
That's about 500 regions.
You would run out of ASCII character in position of #.

Since mobile number is personal identification information, it is not
prudent to directly use it in row key.
You can search for commonly accepted practice on the internet.
If you use hash of mobile number, you would avoid hot spotting.

More detail of your use would be helpful in providing better answer.

Cheers

On Fri, Sep 9, 2016 at 6:09 PM, Manjeet Singh <manjeet.chandhok@gmail.com>
wrote:

> Thanks Ted for links links will help to determine how region split what
> should be the size etc which will really helpful
> but can you correct me if I am not wrong does my understanding was correct
> as I asked in trailing mail?
> I know what will be the salt based on my Mobile number coming in my data
> So assume for mobile number 9999999999 is #
> so my rowkey is #_9999999999
> As i know in advance what is my exact rowkey i can distribute my data on
> cluster to avoid HOTSpoting and i want to distribute my data equally on
> cluster
> So it is mandatory condition to create table according to my splits?
>
> Thanks
> Manjeet
>
> On Sat, Sep 10, 2016 at 6:26 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > Please take a look at:
> >
> > http://hbase.apache.org/book.html#table_schema_rules_of_thumb
> > http://hbase.apache.org/book.html#arch.regions.size
> > http://hbase.apache.org/book.html#ops.capacity.regions
> > http://hbase.apache.org/book.html#ops.capacity.regions.total
> >
> > On Fri, Sep 9, 2016 at 5:35 PM, Manjeet Singh <
> manjeet.chandhok@gmail.com>
> > wrote:
> >
> > > Yeah its in weekdays
> > > Yeah default is 10 gb so what is the way/forumla to knw what shuld be
> the
> > > size of RS
> > > On 9 Sep 2016 19:03, "Ted Yu" <yuzhihong@gmail.com> wrote:
> > >
> > > > Can you clarify whether the incoming data rate is for weekdays ?
> > > >
> > > > At 6-7 Gb /Hour, you need to set larger region size.
> > > > Default is 10GB.
> > > >
> > > > If you know roughly how the key space would be filled, presplit your
> > > table
> > > > accordingly.
> > > >
> > > > On Thu, Sep 8, 2016 at 11:24 PM, Manjeet Singh <
> > > manjeet.chandhok@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hi All
> > > > >
> > > > > I have some basic question can anyone help me out
> > > > >
> > > > > Q1. this is my understanding To perform splitting  I need to create
> > > table
> > > > > like below
> > > > > create 'test_table','c1', SPLITS=>['#", '!', '$'']
> > > > >
> > > > > and I have to design row key in this way
> > > > > #_123456789
> > > > > !_123456789
> > > > > $_123456789
> > > > >
> > > > > so my data distributed on cluster
> > > > >
> > > > > My requirement is very simple I want to equally distributed data
on
> > > > regions
> > > > > as per my rowkey only
> > > > >
> > > > > So please correct me if I am missing any thing?
> > > > >
> > > > >
> > > > > Q2 If i have 5 regions on my each region server and I give 100 MB
> > space
> > > > by
> > > > > using  hbase.hregion.max.filesize property
> > > > >
> > > > > what will happen when my all regions fill with 100 MB data
> > > > > Please note I have cron job secluded on every weekend and my
> Incoming
> > > > data
> > > > > rate is 6-7 Gb /Hour. so my region get filled very fast
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Thanks
> > > > > Manjeet
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > luv all
> > > > >
> > > >
> > >
> >
>
>
>
> --
> luv all
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message