hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manish Maheshwari <mylogi...@gmail.com>
Subject Re: HBase - Count Rows in Regions and Region Servers
Date Fri, 26 Aug 2016 22:18:42 GMT
Thanks Ted.

On Fri, Aug 26, 2016 at 3:16 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> For #1, please look at the following method in HTable.java :
>
>   public NavigableMap<HRegionInfo, ServerName> getRegionLocations() throws
> IOException {
>
> Cheers
>
> On Fri, Aug 26, 2016 at 3:06 PM, Manish Maheshwari <myloginid@gmail.com>
> wrote:
>
> > Thanks Rahul.
> >
> > 1 - I understand the idea of listing the usage on each of the disks that
> we
> > have HBase running on for that table. However how do I map the Nodes to
> > Regions. I looked at RegionLocator - getStartEndKeys. But these just give
> > me the values and not the Hostnames where each region is currently
> running.
> > Is there a way to map the Region to the Node?
> >
> > 2 - Some of our row sizes vary quite a bit depending on the number of
> > updates to the row. This will give us a rough idea of the size of the
> > Region, but not the number of Rows. Is there a way to get both..
> Apologies
> > if I am bothering too much..
> >
> > Thanks,
> > Manish
> >
> >
> >
> >
> >
> > On Fri, Aug 26, 2016 at 12:21 PM, rahul gidwani <rahul.gidwani@gmail.com
> >
> > wrote:
> >
> > > If you want to see which regionservers are currently hot, then jmx
> would
> > be
> > > the best way to get that data.
> > >
> > > If you want to see overall what is hot, you can do this without the use
> > of
> > > a scan (it will be a pretty decent estimate)
> > >
> > > you can do:
> > >
> > > hdfs dfs -du /hbase/data/default/<table_you_care_about>/
> > >
> > > with that data you can create a Map<EncodedRegionName, SizeInBytes>
> > >
> > > Then you can use the RegionLocator to find which region resides on
> which
> > > machine.
> > >
> > > That will tell you the overall skew of your data in terms of raw bytes.
> > >
> > > Should be a pretty decent estimate and a lot faster than scanning your
> > > table provided your table / cluster is sufficiently large.
> > >
> > > hope that helps.
> > > rahul
> > >
> > > On Fri, Aug 26, 2016 at 12:11 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> > >
> > > > Have you looked at /jmx endpoint on the servers ?
> > > > Below is a sample w.r.t. the metrics that would be of interest to
> you:
> > > >
> > > >
> > > > "Namespace_default_table_x_region_6659ba3fe42b4a196daaba9306b505
> > > > 51_metric_appendCount"
> > > > : 0,
> > > >
> > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > > ad_metric_scanNext_num_ops"
> > > > : 0,
> > > >
> > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > > ad_metric_scanNext_min"
> > > > : 0,
> > > >
> > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > > ad_metric_scanNext_max"
> > > > : 0,
> > > >
> > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > > ad_metric_scanNext_mean"
> > > > : 0.0,
> > > >
> > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > > ad_metric_scanNext_median"
> > > > : 0.0,
> > > >
> > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > > ad_metric_scanNext_75th_percentile"
> > > > : 0.0,
> > > >
> > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > > ad_metric_scanNext_95th_percentile"
> > > > : 0.0,
> > > >
> > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > > ad_metric_scanNext_99th_percentile"
> > > > : 0.0,
> > > >
> > > >
> > > > "Namespace_default_table_x_region_823a39a250e81f45e5ef493740d936
> > > > ab_metric_deleteCount"
> > > > : 0,
> > > >
> > > > "Namespace_default_table_x_region_30b82db17b64a83d4aeda9dbd40d62
> > > > 15_metric_deleteCount"
> > > > : 0,
> > > >
> > > > "Namespace_default_table_x_region_c6db2e650b3025aa82032b0e0aa8b7
> > > > 15_metric_appendCount"
> > > > : 0,
> > > >
> > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > > 86_metric_get_num_ops"
> > > > : 0,
> > > >
> > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > > 86_metric_get_min"
> > > > : 0,
> > > >
> > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > > 86_metric_get_max"
> > > > : 0,
> > > >
> > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > > 86_metric_get_mean"
> > > > : 0.0,
> > > >
> > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > > 86_metric_get_median"
> > > > : 0.0,
> > > >
> > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > > 86_metric_get_75th_percentile"
> > > > : 0.0,
> > > >
> > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > > 86_metric_get_95th_percentile"
> > > > : 0.0,
> > > >
> > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > > 86_metric_get_99th_percentile"
> > > > : 0.0,
> > > >
> > > >
> > > > "Namespace_default_table_x_region_5a1fe60f6267c98055b334784e6d76
> > > > d2_metric_mutateCount"
> > > > : 0,
> > > >
> > > > "Namespace_default_table_x_region_66bbec5f7e136b226a19b5fdf9f17c
> > > > be_metric_incrementCount"
> > > > : 0,
> > > >
> > > > On Fri, Aug 26, 2016 at 11:59 AM, Manish Maheshwari <
> > myloginid@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi Ted,
> > > > >
> > > > > I understand the region crash/migration/splitting impact. Currently
> > we
> > > > have
> > > > > hotspotting on few region servers. I am trying to collect the row
> > stats
> > > > at
> > > > > region server and region levels to see how bad the skew of the data
> > is.
> > > > >
> > > > > Manish
> > > > >
> > > > > On Fri, Aug 26, 2016 at 10:19 AM, Ted Yu <yuzhihong@gmail.com>
> > wrote:
> > > > >
> > > > > > Can you elaborate on your use case ?
> > > > > >
> > > > > > Suppose row A is on server B, after you retrieve row A, the
> region
> > > for
> > > > > row
> > > > > > A gets moved to server C (load balancer or server crash). Server
> B
> > > > would
> > > > > no
> > > > > > longer be relevant.
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > > On Fri, Aug 26, 2016 at 10:07 AM, Manish Maheshwari <
> > > > myloginid@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I looked at the HBase Count functionality to count rows
in a
> > Table.
> > > > Is
> > > > > > > there a way that we can count the number of rows in Regions
&
> > > Region
> > > > > > > Servers? When we use a HBase scan, we dont get the Region
ID or
> > > > Region
> > > > > > > Server of the row. Is there a way to do this via Scans?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Manish
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message