hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Rodionov <vladrodio...@gmail.com>
Subject Re: Region is out of bounds
Date Thu, 04 Dec 2014 20:43:26 GMT
Andrew,

What HBase version have you run your test on?

This issue probably does not exist anymore in a latest Apache releases, but
still exists in not so latest, but still actively used, versions of CDH,
HDP etc. We have discovered it during large data set loading ( 100s of GB)
in our cluster (4 nodes).

-Vladimir

On Thu, Dec 4, 2014 at 10:23 AM, Andrew Purtell <apurtell@apache.org> wrote:

> Actually I have set hbase.hstore.blockingStoreFiles to 200 in testing
> exactly :-), but must not have generated sufficient load to encounter the
> issue you are seeing. Maybe it would be possible to adapt one of the ingest
> integration tests to trigger this problem? Set blockingStoreFiles to 200 or
> more. Tune down the region size to 128K or similar. If
> it's reproducible like that please open a JIRA.
>
> On Wed, Dec 3, 2014 at 9:07 AM, Vladimir Rodionov <vladrodionov@gmail.com>
> wrote:
>
> > Kevin,
> >
> > Thank you for your response. This is not a question on how to configure
> > correctly HBase cluster for write heavy workloads. This is internal HBase
> > issue - something is wrong in a default logic of compaction selection
> > algorithm in 0.94-0.98. It seems that nobody has ever tested importing
> data
> > with very high hbase.hstore.blockingStoreFiles value (200 in our case).
> >
> > -Vladimir Rodionov
> >
> > On Wed, Dec 3, 2014 at 6:38 AM, Kevin O'dell <kevin.odell@cloudera.com>
> > wrote:
> >
> > > Vladimir,
> > >
> > >   I know you said, "do not ask me why", but I am going to have to ask
> you
> > > why.  The fact you are doing this(this being blocking store files >
> 200)
> > > tells me there is something or multiple somethings wrong with your
> > cluster
> > > setup.  A couple things come to mind:
> > >
> > > * During this heavy write period, could we use bulk loads?  If so, this
> > > should solve almost all of your problems
> > >
> > > * 1GB region size is WAY too small, and if you are pushing the volume
> of
> > > data you are talking about I would recommend 10 - 20GB region sizes
> this
> > > should help keep your region count smaller as well which will result in
> > > more optimal writes
> > >
> > > * Your cluster may be undersized, if you are setting the blocking to be
> > > that high, you may be pushing too much data for your cluster overall.
> > >
> > > Would you be so kind as to pass me a few pieces of information?
> > >
> > > 1.) Cluster size
> > > 2.) Average region count per RS
> > > 3.) Heap size, Memstore global settings, and block cache settings
> > > 4.) a RS log to pastebin and a time frame of "high writes"
> > >
> > > I can probably make some solid suggestions for you based on the above
> > data.
> > >
> > > On Wed, Dec 3, 2014 at 1:04 AM, Vladimir Rodionov <
> > vladrodionov@gmail.com>
> > > wrote:
> > >
> > > > This is what we observed in our environment(s)
> > > >
> > > > The issue exists in CDH4.5, 5.1, HDP2.1, Mapr4
> > > >
> > > > If some one sets # of blocking stores way above default value, say -
> > 200
> > > to
> > > > avoid write stalls during intensive data loading (do not ask me , why
> > we
> > > do
> > > > this), then
> > > > one of the regions grows indefinitely and takes more 99% of overall
> > > table.
> > > >
> > > > It can't be split because it still has orphaned reference files. Some
> > of
> > > a
> > > > reference files are able to avoid compactions for a long time,
> > obviously.
> > > >
> > > > The split policy is IncreasingToUpperBound, max region size is 1G. I
> do
> > > my
> > > > tests on CDH4.5 mostly but all other distros seem have the same
> issue.
> > > >
> > > > My attempt to add reference files forcefully to compaction list in
> > > > Store.requetsCompaction() when region exceeds recommended maximum
> size
> > > did
> > > > not work out well - some weird results in our test cases (but HBase
> > tests
> > > > are OK: small, medium and large).
> > > >
> > > > What is so special with these reference files? Any ideas, what can be
> > > done
> > > > here to fix the issue?
> > > >
> > > > -Vladimir Rodionov
> > > >
> > >
> > >
> > >
> > > --
> > > Kevin O'Dell
> > > Systems Engineer, Cloudera
> > >
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message