hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From S L <slouie.at.w...@gmail.com>
Subject Re: How does hbase find regionservers for scans
Date Tue, 18 Jul 2017 18:42:52 GMT
Thanks for the tips.  I'm running these queries to debug my program
but this is good to know, especially regarding compaction.  My program
seems to keep running into problems with timing out and running into
rowkeys that look like they should have been removed due to the TTL
expiring but I couldn't prove it to anyone.

On Sun, Jul 16, 2017 at 8:08 AM, Allan Yang <allan163@apache.org> wrote:
> If you want rows start with "0", you should use
> scan 'dbi_based_data', {STARTROW=>'0', STOPROW=>'1' COLUMNS =>
> 'raw_data:processlist', TIMERANGE => [1499205600000, 1499206200000]}
> similar if you want rows start with '28'
> scan 'dbi_based_data', {STARTROW=>'28', STOPROW=>'29' COLUMNS =>
> 'raw_data:processlist', TIMERANGE => [1499205600000, 1499206200000]}
>
> The query you made will became a full table scan query, that is very
> inefficient.
> As for why the second query timed out, there can be many reasons. One
> possible reason is that you have too many delete markers for rows with
> prefix '28'. A major compaction will solve this case.
> But before finding out why, I think change this queries is the first thing
> need to be done.
>
>
> Best Regards
> Allan Yang
>
> 2017-07-15 11:14 GMT+08:00 Ted Yu <yuzhihong@gmail.com>:
>
>> I wonder what time unit you were using.
>>
>> From the example in hbase-shell/src/main/ruby/shell/commands/scan.rb :
>>
>>   hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804,
>> 1303668904]}
>>
>> You can see the time range having much smaller values.
>>
>> Please look at ROWPREFIXFILTER example in the same scan.rb
>>
>> If you check the table UI for dbi_based_data, you would see the start key
>> of each region.
>> From there it is easy to pinpoint which server hosts the relevant region.
>>
>> Cheers
>>
>> On Fri, Jul 14, 2017 at 7:51 PM, S L <slouie.at.work@gmail.com> wrote:
>>
>> > Sorry if this is a basic question.  How does hbase determine which
>> > regionserver the rows are supposed to be stored on?  My rowkey looks like
>> > hash_servername_timestamp, e.g.
>> >
>> > 33_myserver.mydomain.com_1234567890
>> >
>> > If I run the following command:
>> >
>> > scan 'dbi_based_data', {FILTER => "PrefixFilter('0')", COLUMNS =>
>> > 'raw_data:processlist', TIMERANGE => [1499205600000, 1499206200000]}
>> >
>> > I get all the rows that start with "0".  Since hbase stores things in
>> > lexical order, it seems like all rows that were stored lexically first
>> gets
>> > returned.
>> >
>> > However, if I run the following command, hbase times out.  Even if I
>> extend
>> > the timeout period to 3 minutes, it still times out.
>> >
>> > scan 'dbi_based_data', {FILTER => "PrefixFilter('28')", COLUMNS =>
>> > 'raw_data:processlist', TIMERANGE => [1499205600000, 1499206200000]}
>> >
>> > It seems like if it was any other prefix other than "0", it times out
>> (like
>> > above prefix = 28).  I don't understand why it would timeout since it
>> > should be able to calculate which region/regionserver it should go to
>> since
>> > I gave it the prefix to use.
>> >
>> >
>> > I performed "hbase hbck" and it says that
>> >
>> > 9 region servers are alive, 2 are dead
>> >
>> > # of total regions is 15850 for the db but there's only 350 for the table
>> > I'm querying.  There are 0 inconsistencies so the status is "OK".
>> >
>> > Thanks in advance for any help you can give me.
>> >
>>

Mime
View raw message