hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allan Yang <allan...@apache.org>
Subject Re: How does hbase find regionservers for scans
Date Sun, 16 Jul 2017 15:08:40 GMT
If you want rows start with "0", you should use
scan 'dbi_based_data', {STARTROW=>'0', STOPROW=>'1' COLUMNS =>
'raw_data:processlist', TIMERANGE => [1499205600000, 1499206200000]}
similar if you want rows start with '28'
scan 'dbi_based_data', {STARTROW=>'28', STOPROW=>'29' COLUMNS =>
'raw_data:processlist', TIMERANGE => [1499205600000, 1499206200000]}

The query you made will became a full table scan query, that is very
inefficient.
As for why the second query timed out, there can be many reasons. One
possible reason is that you have too many delete markers for rows with
prefix '28'. A major compaction will solve this case.
But before finding out why, I think change this queries is the first thing
need to be done.


Best Regards
Allan Yang

2017-07-15 11:14 GMT+08:00 Ted Yu <yuzhihong@gmail.com>:

> I wonder what time unit you were using.
>
> From the example in hbase-shell/src/main/ruby/shell/commands/scan.rb :
>
>   hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804,
> 1303668904]}
>
> You can see the time range having much smaller values.
>
> Please look at ROWPREFIXFILTER example in the same scan.rb
>
> If you check the table UI for dbi_based_data, you would see the start key
> of each region.
> From there it is easy to pinpoint which server hosts the relevant region.
>
> Cheers
>
> On Fri, Jul 14, 2017 at 7:51 PM, S L <slouie.at.work@gmail.com> wrote:
>
> > Sorry if this is a basic question.  How does hbase determine which
> > regionserver the rows are supposed to be stored on?  My rowkey looks like
> > hash_servername_timestamp, e.g.
> >
> > 33_myserver.mydomain.com_1234567890
> >
> > If I run the following command:
> >
> > scan 'dbi_based_data', {FILTER => "PrefixFilter('0')", COLUMNS =>
> > 'raw_data:processlist', TIMERANGE => [1499205600000, 1499206200000]}
> >
> > I get all the rows that start with "0".  Since hbase stores things in
> > lexical order, it seems like all rows that were stored lexically first
> gets
> > returned.
> >
> > However, if I run the following command, hbase times out.  Even if I
> extend
> > the timeout period to 3 minutes, it still times out.
> >
> > scan 'dbi_based_data', {FILTER => "PrefixFilter('28')", COLUMNS =>
> > 'raw_data:processlist', TIMERANGE => [1499205600000, 1499206200000]}
> >
> > It seems like if it was any other prefix other than "0", it times out
> (like
> > above prefix = 28).  I don't understand why it would timeout since it
> > should be able to calculate which region/regionserver it should go to
> since
> > I gave it the prefix to use.
> >
> >
> > I performed "hbase hbck" and it says that
> >
> > 9 region servers are alive, 2 are dead
> >
> > # of total regions is 15850 for the db but there's only 350 for the table
> > I'm querying.  There are 0 inconsistencies so the status is "OK".
> >
> > Thanks in advance for any help you can give me.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message