hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: input split process
Date Wed, 31 May 2017 13:55:09 GMT
Rajesh:
Currently there is no (persisted) statistics on the distribution of rowkey
values within one region.

You can perform sampling to get approximation.

Cheers

On Wed, May 31, 2017 at 5:05 AM, Rajeshkumar J <rajeshkumarit8292@gmail.com>
wrote:

> So we will know only start and end rowkey values of a region. We can't know
> the other rowkey values within that region.
>
> On Wed, May 31, 2017 at 2:59 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > If you look at TableInputFormatBase#getSplits(), you would see that the
> > following information can be retrieved:
> >
> > Table
> > RegionLocator
> > Admin
> > StartEndKeys (region boundaries)
> >
> > You can also take a look at calculateRebalancedSplits() to see how it
> > rebalances the InputSplit's.
> >
> > FYI
> >
> > On Tue, May 30, 2017 at 11:53 PM, Rajeshkumar J <
> > rajeshkumarit8292@gmail.com
> > > wrote:
> >
> > > Hi,
> > >
> > >    I want to custom input split my hbase data. can any one tell me what
> > are
> > > the values I have known during this split process like only rowkey
> values
> > > or any others
> > >
> > > Thanks
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message