hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manjeet Singh <manjeet.chand...@gmail.com>
Subject Re: Query regarding pre-split Major compaction
Date Mon, 12 Sep 2016 05:43:47 GMT
Hi
I am attaching screenshot
[image: Inline image 2]

can anyone help me to figure out I can see that my first region was empty
as their was no start rowkey same with end row key
second my data actually disturbed on only 2 nodes i have 5 nodes

Thanks
Manjeet


On Mon, Sep 12, 2016 at 10:38 AM, Manjeet Singh <manjeet.chandhok@gmail.com>
wrote:

> Thanks Ted for your inputs
> I have write some algorithm to convert my some String to single char like
> # $ ! etc and its my salt so based on these I know whats my salt
> as my input data was so random and I need to know in advance what is my
> rowkey (Hash like Md5 generates long string , which coz some performance
> impact because my rowkey was getting log)
>
> In my lab testing I found that number of region created but one region
> start row ! was empty
> As i observe i create my table with pre split table with these char and
> data did't come which starts with !
>
> is their any way to distribute data equally to all region and I know what
> what is my salt is its fix like !@#$%
>
> Thanks
> Manjeet
>
> On Sat, Sep 10, 2016 at 7:04 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>
>> Given ingestion rate of 7GB / hour, you would have ~ 5000GB data per
>> month.
>> That's about 500 regions.
>> You would run out of ASCII character in position of #.
>>
>> Since mobile number is personal identification information, it is not
>> prudent to directly use it in row key.
>> You can search for commonly accepted practice on the internet.
>> If you use hash of mobile number, you would avoid hot spotting.
>>
>> More detail of your use would be helpful in providing better answer.
>>
>> Cheers
>>
>> On Fri, Sep 9, 2016 at 6:09 PM, Manjeet Singh <manjeet.chandhok@gmail.com
>> >
>> wrote:
>>
>> > Thanks Ted for links links will help to determine how region split what
>> > should be the size etc which will really helpful
>> > but can you correct me if I am not wrong does my understanding was
>> correct
>> > as I asked in trailing mail?
>> > I know what will be the salt based on my Mobile number coming in my data
>> > So assume for mobile number 9999999999 is #
>> > so my rowkey is #_9999999999
>> > As i know in advance what is my exact rowkey i can distribute my data on
>> > cluster to avoid HOTSpoting and i want to distribute my data equally on
>> > cluster
>> > So it is mandatory condition to create table according to my splits?
>> >
>> > Thanks
>> > Manjeet
>> >
>> > On Sat, Sep 10, 2016 at 6:26 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>> >
>> > > Please take a look at:
>> > >
>> > > http://hbase.apache.org/book.html#table_schema_rules_of_thumb
>> > > http://hbase.apache.org/book.html#arch.regions.size
>> > > http://hbase.apache.org/book.html#ops.capacity.regions
>> > > http://hbase.apache.org/book.html#ops.capacity.regions.total
>> > >
>> > > On Fri, Sep 9, 2016 at 5:35 PM, Manjeet Singh <
>> > manjeet.chandhok@gmail.com>
>> > > wrote:
>> > >
>> > > > Yeah its in weekdays
>> > > > Yeah default is 10 gb so what is the way/forumla to knw what shuld
>> be
>> > the
>> > > > size of RS
>> > > > On 9 Sep 2016 19:03, "Ted Yu" <yuzhihong@gmail.com> wrote:
>> > > >
>> > > > > Can you clarify whether the incoming data rate is for weekdays
?
>> > > > >
>> > > > > At 6-7 Gb /Hour, you need to set larger region size.
>> > > > > Default is 10GB.
>> > > > >
>> > > > > If you know roughly how the key space would be filled, presplit
>> your
>> > > > table
>> > > > > accordingly.
>> > > > >
>> > > > > On Thu, Sep 8, 2016 at 11:24 PM, Manjeet Singh <
>> > > > manjeet.chandhok@gmail.com
>> > > > > >
>> > > > > wrote:
>> > > > >
>> > > > > > Hi All
>> > > > > >
>> > > > > > I have some basic question can anyone help me out
>> > > > > >
>> > > > > > Q1. this is my understanding To perform splitting  I need
to
>> create
>> > > > table
>> > > > > > like below
>> > > > > > create 'test_table','c1', SPLITS=>['#", '!', '$'']
>> > > > > >
>> > > > > > and I have to design row key in this way
>> > > > > > #_123456789
>> > > > > > !_123456789
>> > > > > > $_123456789
>> > > > > >
>> > > > > > so my data distributed on cluster
>> > > > > >
>> > > > > > My requirement is very simple I want to equally distributed
>> data on
>> > > > > regions
>> > > > > > as per my rowkey only
>> > > > > >
>> > > > > > So please correct me if I am missing any thing?
>> > > > > >
>> > > > > >
>> > > > > > Q2 If i have 5 regions on my each region server and I give
100
>> MB
>> > > space
>> > > > > by
>> > > > > > using  hbase.hregion.max.filesize property
>> > > > > >
>> > > > > > what will happen when my all regions fill with 100 MB data
>> > > > > > Please note I have cron job secluded on every weekend and
my
>> > Incoming
>> > > > > data
>> > > > > > rate is 6-7 Gb /Hour. so my region get filled very fast
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > Thanks
>> > > > > > Manjeet
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > luv all
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > luv all
>> >
>>
>
>
>
> --
> luv all
>



-- 
luv all

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message