hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wellington Chevreuil <wellington.chevre...@gmail.com>
Subject Re: [DISCUSS] IncreasingToUpperBoundRegionSplitPolicy can lead to unpredictable large regions size
Date Fri, 26 Jun 2020 09:31:16 GMT
>
> It's supposed to be controlling how big the region is?
>
Precisely. It may not make a big difference for compaction itself, but
might have further implications on overall RS resource usage, with larger
than expected regions.  Given the feedback provided here, I guess we can
proceed with current proposal from HBASE-24530 all the way to maintenance
branches (it doesn't change IncreasingToUpperBoundRegionSplitPolicy
behaviour, but adds a new policy that in fact respect region max size for
the whole region). We can then fix IncreasingToUpperBoundRegionSplitPolicy
at minor versions branches as suggested by Busbey.

Em qua., 24 de jun. de 2020 às 18:00, Andrew Purtell <apurtell@apache.org>
escreveu:

> It's supposed to be controlling how big the region is?
>
> On Wed, Jun 24, 2020 at 8:42 AM 张铎(Duo Zhang) <palomino219@gmail.com>
> wrote:
>
> > I think one of the goals of limiting the store file size is for
> compaction.
> > As long as we just do compactions per family, what is the actual problem
> if
> > the whole region is too big?
> >
> > Wellington Chevreuil <wellington.chevreuil@gmail.com> 于2020年6月24日周三
> > 下午10:56写道:
> >
> > > The expected behaviour for the property is well documented, so renaming
> > and
> > > deprecation would rather be a separate task. HBASE-24530 should concern
> > > with making IncreasingToUpperBoundRegionSplitPolicy respect what
> > > hbase.hregion.max.filesize and MAX_FILESIZE table level descriptor
> > > documentation mandate, as well as being consistent with other split
> > > policies behaviour in relation to these properties.
> > >
> > > Em qua., 24 de jun. de 2020 às 08:00, Anoop John <
> anoop.hbase@gmail.com>
> > > escreveu:
> > >
> > > > If we are going to change (correct)   hbase.hregion.max.filesize to
> > > > hbase.hregion.max.size  (Via proper deprecation cycle) also along
> with
> > > this
> > > > change, am good.
> > > >
> > > > Anoop
> > > >
> > > > On Wed, Jun 24, 2020 at 1:29 AM Sean Busbey <busbey@apache.org>
> wrote:
> > > >
> > > > > Let's fix via approach #3. Get it done for next minor versions and
> > then
> > > > if
> > > > > folks aren't sure about principle of least surprise we can talk
> about
> > > > > wether it goes into maintenance releases.
> > > > >
> > > > > On Tue, Jun 23, 2020, 13:07 Andrew Purtell <apurtell@apache.org>
> > > wrote:
> > > > >
> > > > > > > Current IncreasingToUpperBoundRegionSplitPolicy implementation
> is
> > > > > > violating those configs.
> > > > > >
> > > > > > Thank you for pointing this out. I feel even more strongly now
> this
> > > is
> > > > a
> > > > > > bug.
> > > > > > I vote for #3.
> > > > > >
> > > > > > On Tue, Jun 23, 2020 at 2:42 AM Wellington Chevreuil <
> > > > > > wellington.chevreuil@gmail.com> wrote:
> > > > > >
> > > > > > > >
> > > > > > > > The config name was/is   hbase.hregion.max.*filesize*
and
> > never *
> > > > > > > > hbase.hregion.max.size*.
> > > > > > > >
> > > > > > >
> > > > > > > Description for hbase.hregion.max.filesize is very clear
> stating
> > > that
> > > > > > it's
> > > > > > > the sum of all hfiles in the region that should not exceed
this
> > > > > property
> > > > > > > value. And we not always use  *hbase.hregion.max.filesize*
to
> > > > determine
> > > > > > the
> > > > > > > limit, but a MAX_FILESIZE table level descriptor whose
> > description
> > > > > reads
> > > > > > as
> > > > > > > below, on TableDescriptorBuilder javadoc:
> > > > > > >
> > > > > > >   /**
> > > > > > >    * Returns the maximum size upto which a region can grow
to
> > after
> > > > > > which a
> > > > > > >    * region split is triggered. The region size is represented
> by
> > > the
> > > > > > size
> > > > > > > of
> > > > > > >    * the biggest store file in that region.
> > > > > > >    *
> > > > > > >    * @return max hregion size for table, -1 if not set.
> > > > > > >    */
> > > > > > >
> > > > > > > Current IncreasingToUpperBoundRegionSplitPolicy implementation
> is
> > > > > > violating
> > > > > > > those configs.
> > > > > > >
> > > > > > > Do we have a consensus on applying #3 for all active branches?
> If
> > > > so, I
> > > > > > > would instruct HBASE-24530 to proceed as such.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Em dom., 21 de jun. de 2020 às 19:09, Andrew Purtell <
> > > > > > > andrew.purtell@gmail.com> escreveu:
> > > > > > >
> > > > > > > > ‘Filesize’ and ‘size’ are ambiguous. They
are open to
> > > > interpretation
> > > > > > and
> > > > > > > I
> > > > > > > > don’t see one as more clear than the other, other
than to
> imply
> > > > > > something
> > > > > > > > about file level measures being the determining factor.
It
> > > doesn’t
> > > > > > convey
> > > > > > > > more semantics beyond that, ie one file trips the
limit or
> the
> > > > > combined
> > > > > > > > sizes of all files trips the limit. We can fix that
with
> > > clarifying
> > > > > > > > documentation. While doing so we also have an opportunity
to
> > fix
> > > > > > > something
> > > > > > > > if our consensus is the current policy is not the
usual user
> > > > > > expectation.
> > > > > > > >
> > > > > > > > So how suboptimal is it? Does a compatibility concern
make
> > sense
> > > if
> > > > > we
> > > > > > > > think this is just broken? Perhaps we can address
all
> concerns
> > by
> > > > > > making
> > > > > > > > the change in next minor releases and then do those
minor
> > > releases
> > > > > > soon.
> > > > > > > >
> > > > > > > >
> > > > > > > > > On Jun 20, 2020, at 11:06 PM, Anoop John <
> > > anoop.hbase@gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > I have a concern if we do #3 for all minor
versions.  That
> > > will
> > > > > be a
> > > > > > > > major
> > > > > > > > > split behaviour change and can affect so much
for tables
> with
> > > > many
> > > > > > CFs.
> > > > > > > > If
> > > > > > > > > one adjusted the pre splits so as to avoid further
region
> > > splits,
> > > > > > that
> > > > > > > > calc
> > > > > > > > > might go wrong once they migrate to new minor
versions with
> > > this
> > > > > > change
> > > > > > > > > right?
> > > > > > > > > The config name was/is   hbase.hregion.max.*filesize*
and
> > > never *
> > > > > > > > > hbase.hregion.max.size*.  We will have HFiles
at CF level
> and
> > > so
> > > > a
> > > > > > max
> > > > > > > > > filesize is applicable at CF level.   So even
this config
> > name
> > > > will
> > > > > > > > create
> > > > > > > > > confusion once we change the calc to consider
size at
> region
> > > > level
> > > > > > (Sum
> > > > > > > > of
> > > > > > > > > sizes at CFs)
> > > > > > > > >
> > > > > > > > > Anoop
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >> On Fri, Jun 19, 2020 at 11:44 PM Viraj Jasani
<
> > > > vjasani@apache.org
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >>
> > > > > > > > >> Given that SteppingSplitPolicy is the default
region split
> > > > policy,
> > > > > > > > removal
> > > > > > > > >> of IncreasingToUpperBoundRegionSplitPolicy
is going to
> make
> > > > things
> > > > > > > more
> > > > > > > > >> complex for master branch if we follow #2.
> > > > > > > > >> Hence, I believe we should better go with
#3 for all.
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>> On 2020/06/19 17:52:27, Viraj Jasani
<vjasani@apache.org
> >
> > > > wrote:
> > > > > > > > >>> Can we do a mix of #2 and #3 i.e remove
> > > > > > > > >> IncreasingToUpperBoundRegionSplitPolicy from
master, and
> > > follow
> > > > #3
> > > > > > for
> > > > > > > > >> branch-2 and all active release branches?
If it breaks any
> > > > > > > compatibility
> > > > > > > > >> rules, then we can go with #3 for all.
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>> On 2020/06/19 17:33:14, Andrew Purtell
<
> > apurtell@apache.org>
> > > > > > wrote:
> > > > > > > > >>>> I vote for #3, and it should be applied
to all active
> code
> > > > > lines.
> > > > > > > > >>>>
> > > > > > > > >>>>
> > > > > > > > >>>> On Fri, Jun 19, 2020 at 3:35 AM Wellington
Chevreuil <
> > > > > > > > >>>> wellington.chevreuil@gmail.com>
wrote:
> > > > > > > > >>>>
> > > > > > > > >>>>> While going through the changes
proposed on
> HBASE-24530,
> > we
> > > > > > > > >>>>> observed IncreasingToUpperBoundRegionSplitPolicy
> > > > > > > > >>>>> compares hbase.hregion.max.filesize
against individual
> > > stores
> > > > > > > within
> > > > > > > > >> a
> > > > > > > > >>>>> region when deciding whether
to split a region or not.
> > For
> > > > > tables
> > > > > > > > >> having
> > > > > > > > >>>>> multiple families, this can lead
to regions much larger
> > > than
> > > > > > what's
> > > > > > > > >>>>> defined by hbase.hregion.max.filesize.
> > > > > > > > >>>>>
> > > > > > > > >>>>> Current proposal on HBASE-24530
is to add an extra
> policy
> > > > that
> > > > > > > > >> actually
> > > > > > > > >>>>> compares the overall region size
(combining all region
> > > stores
> > > > > > > sizes)
> > > > > > > > >>>>> against hbase.hregion.max.filesize,
but I wonder if it
> > > really
> > > > > > makes
> > > > > > > > >> sense
> > > > > > > > >>>>> to keep a policy with current
> > > > > > > IncreasingToUpperBoundRegionSplitPolicy
> > > > > > > > >>>>> behaviour. Would like to hear
folks opinions if we
> should
> > > > take
> > > > > > any
> > > > > > > > >> of the
> > > > > > > > >>>>> below actions?
> > > > > > > > >>>>> 1) Leave IncreasingToUpperBoundRegionSplitPolicy
as it
> is
> > > and
> > > > > > just
> > > > > > > > >> add the
> > > > > > > > >>>>> new policy proposed on HBASE-24530;
> > > > > > > > >>>>> 2) Make IncreasingToUpperBoundRegionSplitPolicy
> > deprecated
> > > > and
> > > > > > > > >> remove it
> > > > > > > > >>>>> from master branch;
> > > > > > > > >>>>> 3) Change IncreasingToUpperBoundRegionSplitPolicy
to
> > > actually
> > > > > > > > >> implement the
> > > > > > > > >>>>> logic of the new policy proposed
on HBASE-24530;
> > > > > > > > >>>>>
> > > > > > > > >>>>> My view is that the current
> > > > > > IncreasingToUpperBoundRegionSplitPolicy
> > > > > > > > >>>>> behaviour is a bug, and I vote
for #3.
> > > > > > > > >>>>>
> > > > > > > > >>>>
> > > > > > > > >>>>
> > > > > > > > >>>> --
> > > > > > > > >>>> Best regards,
> > > > > > > > >>>> Andrew
> > > > > > > > >>>>
> > > > > > > > >>>> Words like orphans lost among the
crosstalk, meaning
> torn
> > > from
> > > > > > > truth's
> > > > > > > > >>>> decrepit hands
> > > > > > > > >>>>   - A23, Crosstalk
> > > > > > > > >>>>
> > > > > > > > >>>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best regards,
> > > > > > Andrew
> > > > > >
> > > > > > Words like orphans lost among the crosstalk, meaning torn from
> > > truth's
> > > > > > decrepit hands
> > > > > >    - A23, Crosstalk
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>    - A23, Crosstalk
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message