hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Major Compaction Concerns
Date Mon, 09 Jan 2012 00:19:02 GMT
For #2 below, I suggest more validation against 0.90.5 - 0.90.1 is pretty
old.

Cheers

On Sun, Jan 8, 2012 at 3:05 PM, Mikael Sitruk <mikael.sitruk@gmail.com>wrote:

> Ted hi
>
> 1. thanks for pointing on  HBASE-3051, Compaction at the granularity of a
> column-family, it seems promising
>
> 2. Regarding manual management of compaction - it is exactly what i tried
> to do and found all the finding. *In short there is no way to disable major
> compaction from running automatically* (point #1 in original email), should
> a JIRA be opened?
>
> 3. I have opened the following ones
> HBASE-5146  - Hbase Shell - allow setting config properties
> HBASE-5147 - Compaction/Major compaction operation from shell/API/JMX
> HBASE-5148 - Compaction property at the server level are not propagated at
> the table level
> HBASE-5149 - getConfiguration() implementation is misleading
>
> Regards,
> Mikael.S
>
> On Sun, Jan 8, 2012 at 11:07 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > HBASE-3051, Compaction at the granularity of a column-family, is marked
> > implemented by HBASE-3796
> > <https://issues.apache.org/jira/browse/HBASE-3796>which is in 0.92
> > (0.92 RC3 is coming out soon)
> >
> > Please see http://hbase.apache.org/book/regions.arch.html, 8.7.5.5 which
> > refers to
> >
> >
> http://hbase.apache.org/book/important_configurations.html#managed.compactions
> >
> > Cheers
> >
> > On Sun, Jan 8, 2012 at 12:55 PM, Mikael Sitruk <mikael.sitruk@gmail.com
> > >wrote:
> >
> > > Well I'm very interested to dig further. I can also tell that the
> number
> > of
> > > log is getting very high very fast and of course a flush is triggered
> > > adding more store files. Very fast the high number of store files
> trigger
> > > compaction and delay the flushing (default delay is 90000 ms).  The
> files
> > > are small in size, major compaction is not needed but minor yes.
> > > Nevertheless the code ignore the disabled automatic compaction and
> > promotes
> > > files to major compaction.
> > > I think I need to play with both the log file size the compaction
> > threshold
> > > and the Max number of stores file. Do you have some recommendations?
> > > Btw the compaction take about 1min 40 sec for a store size of 900MB
> +/-.
> > Is
> > > it normal?
> > > One thing that does not help in this story is that I have 2 column
> > families
> > > and each RS manages 100 of regions each cf growth with differents
> speed.
> > > Is there a version of hbase handling better such case (not flushing
> both
> > cf
> > > if not needed to)?
> > >
> > > I will review the release note of the versions you suggested and open
> > > issues/enhancements we discuss.
> > >
> > > Thanks
> > > Cheers.
> > > On Jan 8, 2012 10:22 PM, "Ted Yu" <yuzhihong@gmail.com> wrote:
> > >
> > > > Your request in first paragraph below deserves a JIRA.
> > > >
> > > > For 2.b I agree a bug should be filed.
> > > >
> > > > For major compaction, adding more logs on region server side should
> > help
> > > > you understand the situation better - assuming you have interest to
> dig
> > > > further.
> > > > Please upgrade to 0.90.5, or you can wait for 0.90.6 release which is
> > > > slated for Jan. 19th.
> > > >
> > > > After upgrade, the logs and code would be more pertinent to the tip
> of
> > > 0.90
> > > > branch.
> > > >
> > > > Thanks for summarizing your findings.
> > > >
> > > > On Sun, Jan 8, 2012 at 12:04 PM, Mikael Sitruk <
> > mikael.sitruk@gmail.com
> > > > >wrote:
> > > >
> > > > > In fact I think that for 2.a the current implementation is
> > misleading.
> > > > > Creating a connection and getting the configuration from the
> > connection
> > > > > should return the configuration of the cluster.
> > > > > Requesting the configuration used to build an object should return
> > the
> > > > > configuration set on the object
> > > > > Additionally it should be a new method like getConfigurations(),
or
> > > > > getClusterConfigurations() returning a map of serverinfo and
> > > > > configuration.  Another option is to add on the HRegionServer and
> > > > HMaster a
> > > > > method getConfiguration() returning the configuration object used
> by
> > > the
> > > > > RegionServer or Master
> > > > >
> > > > > Regarding 2.b yes I tried but it did not return the setting from
> the
> > > > > cluster configuration (again server has non default configuration,
> > > table
> > > > > was not configured with specific values then cluster configuration
> > > should
> > > > > apply on the table object). So I see it as problematic.
> > > > >
> > > > > Mikael.s
> > > > >  On Jan 8, 2012 7:54 PM, <yuzhihong@gmail.com> wrote:
> > > > >
> > > > > > About 2b, have you tried getting the major compaction setting
> from
> > > > column
> > > > > > descriptor ?
> > > > > >
> > > > > > For 2a, what you requested would result in new methods of
> > > > > > HBaseConfiguration class to be added. Currently the configuration
> > on
> > > > > client
> > > > > > class path would be used.
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Jan 8, 2012, at 9:28 AM, Mikael Sitruk <
> mikael.sitruk@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Ted hi
> > > > > > > First thanks for answering, regarding the JIRA i will fill
them
> > > > > > > Second, it seems that i did not explain myself correctly
> > regarding
> > > > > 2.a. -
> > > > > > > As you i do not expect that a configuration set on my client
> will
> > > be
> > > > > > > propagated to the cluster, but i do expect that if i set
a
> > > > > configuration
> > > > > > on
> > > > > > > a server then doing connection.getConfiguration() from
a
> client i
> > > > will
> > > > > > get
> > > > > > > teh configuration from the cluster.
> > > > > > > Currently the configuration returned is from the client
config.
> > > > > > > So the problem is that you have no way to check the
> configuration
> > > of
> > > > a
> > > > > > > cluster.
> > > > > > > I would expect to have some API to return the cluster config
> and
> > > even
> > > > > > > getting a map <serverInfo, config> so it can be easy
to check
> > > cluster
> > > > > > > problem using code.
> > > > > > >
> > > > > > > 2.b. I know this code, and i tried to validate it. I set
in the
> > > > server
> > > > > > > config the "hbase.hregion.majorcompaction" to "0", then
start
> the
> > > > > server
> > > > > > > (cluster). Since from the UI or from JMX this parameter
is not
> > > > visible
> > > > > at
> > > > > > > the cluster level, I try to get the value from the client
(to
> see
> > > > that
> > > > > > the
> > > > > > > cluster is using it)
> > > > > > >
> > > > > > > *HTableDescriptor hTableDescriptor =
> > > > > > > conn.getHTableDescriptor(Bytes.toBytes("my table"));*
> > > > > > >
> > > > > > > *hTableDescriptor.getValue("hbase.hregion.majorcompaction")*
> > > > > > > but i still got 24h (and not the value set in the config)!
that
> > was
> > > > my
> > > > > > > problem from the beginning! ==> Using the config (on
the server
> > > side)
> > > > > > will
> > > > > > > not propagate into the table/column family
> > > > > > >
> > > > > > > Mikael.S
> > > > > > >
> > > > > > > On Sun, Jan 8, 2012 at 7:13 PM, Ted Yu <yuzhihong@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > >> I am not expert in major compaction feature.
> > > > > > >> Let me try to answer questions in #2.
> > > > > > >>
> > > > > > >> 2.a
> > > > > > >>> If I set the property via the configuration shouldn’t
all the
> > > > cluster
> > > > > > be
> > > > > > >>> aware of?
> > > > > > >>
> > > > > > >> There're multiple clients connecting to one cluster.
I
> wouldn't
> > > > expect
> > > > > > >> values in the configuration (m_hbConfig) to propagate
onto the
> > > > > cluster.
> > > > > > >>
> > > > > > >> 2.b
> > > > > > >> Store.getNextMajorCompactTime() shows that
> > > > > > "hbase.hregion.majorcompaction"
> > > > > > >> can be specified per column family:
> > > > > > >>
> > > > > > >> long getNextMajorCompactTime() {
> > > > > > >>   // default = 24hrs
> > > > > > >>   long ret = conf.getLong(HConstants.MAJOR_COMPACTION_PERIOD,
> > > > > > >> 1000*60*60*24);
> > > > > > >>   if (family.getValue(HConstants.MAJOR_COMPACTION_PERIOD)
!=
> > > null) {
> > > > > > >>
> > > > > > >> 2.d
> > > > > > >>> d. I tried also to setup the parameter via hbase
shell but
> > > setting
> > > > > such
> > > > > > >>> properties is not supported. (do you plan to add
such support
> > via
> > > > the
> > > > > > >>> shell?)
> > > > > > >>
> > > > > > >> This is a good idea. Please open a JIRA.
> > > > > > >>
> > > > > > >> For #5, HBASE-3965 is an improvement and doesn't have
a patch
> > yet.
> > > > > > >>
> > > > > > >> Allow me to quote Alan Kay: 'The best way to predict
the
> future
> > is
> > > > to
> > > > > > >> invent it.'
> > > > > > >>
> > > > > > >> Once we have a patch, we can always backport it to
0.92 after
> > some
> > > > > > people
> > > > > > >> have verified the improvement.
> > > > > > >>
> > > > > > >>> 6.       In case a compaction (major) is running
it seems
> there
> > > is
> > > > no
> > > > > > way
> > > > > > >>> to stop-it. Do you plan to add such feature?
> > > > > > >>
> > > > > > >> Again, logging a JIRA would provide a good starting
point for
> > > > > > discussion.
> > > > > > >>
> > > > > > >> Thanks for the verification work and suggestions, Mikael.
> > > > > > >>
> > > > > > >> On Sun, Jan 8, 2012 at 7:27 AM, Mikael Sitruk <
> > > > > mikael.sitruk@gmail.com
> > > > > > >>> wrote:
> > > > > > >>
> > > > > > >>> I forgot to mention, I'm using HBase 0.90.1
> > > > > > >>>
> > > > > > >>> Regards,
> > > > > > >>> Mikael.S
> > > > > > >>>
> > > > > > >>> On Sun, Jan 8, 2012 at 5:25 PM, Mikael Sitruk <
> > > > > mikael.sitruk@gmail.com
> > > > > > >>>> wrote:
> > > > > > >>>
> > > > > > >>>> Hi
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>> I have some concern regarding major compactions
below...
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>>   1. According to best practices from the mailing
list and
> > from
> > > > the
> > > > > > >>>>   book, automatic major compaction should be
disabled. This
> > can
> > > be
> > > > > > >> done
> > > > > > >>> by
> > > > > > >>>>   setting the property ‘hbase.hregion.majorcompaction’
to
> ‘0’.
> > > > > > >>> Neverhteless
> > > > > > >>>>   even after having doing this I STILL see
“major
> compaction”
> > > > > messages
> > > > > > >>> in
> > > > > > >>>>   logs. therefore it is unclear how can I manage
major
> > > > compactions.
> > > > > > >> (The
> > > > > > >>>>   system has heavy insert - uniformly on the
cluster, and
> > major
> > > > > > >>> compaction
> > > > > > >>>>   affect the performance of the system).
> > > > > > >>>>   If I'm not wrong it seems from the code that:
even if not
> > > > > requested
> > > > > > >>>>   and even if the indicator is set to '0' (no
automatic
> major
> > > > > > >>> compaction),
> > > > > > >>>>   major compaction can be triggered by the
code in case all
> > > store
> > > > > > >> files
> > > > > > >>> are
> > > > > > >>>>   candidate for a compaction (from Store.compact(final
> boolean
> > > > > > >>> forceMajor)).
> > > > > > >>>>   Shouldn't the code add a condition that automatic
major
> > > > compaction
> > > > > > >> is
> > > > > > >>>>   disabled??
> > > > > > >>>>
> > > > > > >>>>   2. I tried to check the parameter
> > > >  ‘hbase.hregion.majorcompaction’
> > > > > > >> at
> > > > > > >>>>   runtime using several approaches - to validate
that the
> > server
> > > > > > >> indeed
> > > > > > >>>>   loaded the parameter.
> > > > > > >>>>
> > > > > > >>>> a. Using a connection created from local config
> > > > > > >>>>
> > > > > > >>>> *conn = (HConnection)
> > > > HConnectionManager.getConnection(m_hbConfig);*
> > > > > > >>>>
> > > > > > >>>>
> > > > *conn.getConfiguration().getString(“hbase.hregion.majorcompaction”)*
> > > > > > >>>>
> > > > > > >>>> returns the parameter from local config and
not from
> cluster.
> > Is
> > > > it
> > > > > a
> > > > > > >>> bug?
> > > > > > >>>> If I set the property via the configuration
shouldn’t all
> the
> > > > > cluster
> > > > > > >> be
> > > > > > >>>> aware of? (supposing that the connection indeed
connected to
> > the
> > > > > > >> cluster)
> > > > > > >>>>
> > > > > > >>>> b.  fetching the property from the table descriptor
> > > > > > >>>>
> > > > > > >>>> *HTableDescriptor hTableDescriptor =
> > > > > > >>>> conn.getHTableDescriptor(Bytes.toBytes("my
table"));*
> > > > > > >>>>
> > > > > > >>>> *hTableDescriptor.getValue("hbase.hregion.majorcompaction")*
> > > > > > >>>>
> > > > > > >>>> This will returns the default parameter value
(1 day) not
> the
> > > > > > parameter
> > > > > > >>>> from the configuration (on the cluster). It
seems to be a
> bug,
> > > > isn’t
> > > > > > >> it?
> > > > > > >>>> (the parameter from the config, should be the
default if not
> > set
> > > > at
> > > > > > the
> > > > > > >>>> table level)
> > > > > > >>>>
> > > > > > >>>> c. The only way I could set the parameter to
0 and really
> see
> > it
> > > > is
> > > > > > via
> > > > > > >>>> the Admin API, updating the table descriptor
or the column
> > > > > descriptor.
> > > > > > >>> Now
> > > > > > >>>> I could see the parameter on the web UI. So
is it the only
> way
> > > to
> > > > > set
> > > > > > >>>> correctly the parameter? If setting the parameter
via the
> > > > > > configuration
> > > > > > >>>> file, shouldn’t the webUI show this on any
table created?
> > > > > > >>>>
> > > > > > >>>> d. I tried also to setup the parameter via
hbase shell but
> > > setting
> > > > > > such
> > > > > > >>>> properties is not supported. (do you plan to
add such
> support
> > > via
> > > > > the
> > > > > > >>>> shell?)
> > > > > > >>>>
> > > > > > >>>> e. Generally is it possible to get via API
the configuration
> > > used
> > > > by
> > > > > > >> the
> > > > > > >>>> servers? (at cluster/server level)
> > > > > > >>>>
> > > > > > >>>>    3.  I ran both major compaction  requests
from the shell
> or
> > > > from
> > > > > > >> API
> > > > > > >>>> but since both are async there is no progress
indication.
> > > Neither
> > > > > the
> > > > > > >> JMX
> > > > > > >>>> nor the Web will help here since you don’t
know if a
> > compaction
> > > > task
> > > > > > is
> > > > > > >>>> running. Tailling the logs is not an efficient
way to do
> this
> > > > > neither.
> > > > > > >>> The
> > > > > > >>>> point is that I would like to automate the
process and avoid
> > > > > > compaction
> > > > > > >>>> storm. So I want to do that region, region,
but if I don’t
> > know
> > > > > when a
> > > > > > >>>> compaction started/ended I can’t automate
it.
> > > > > > >>>>
> > > > > > >>>> 4.       In case there is no compaction files
in queue (but
> > > still
> > > > > you
> > > > > > >>> have
> > > > > > >>>> more than 1 storefile per store e.g. minor
compaction just
> > > > finished)
> > > > > > >> then
> > > > > > >>>> invoking major_compact will indeed decrease
the number of
> > store
> > > > > files,
> > > > > > >>> but
> > > > > > >>>> the compaction queue will remain to 0 during
the compaction
> > task
> > > > > > >>> (shouldn’t
> > > > > > >>>> the compaction queue increase by the number
of file to
> compact
> > > and
> > > > > be
> > > > > > >>>> reduced when the task ended?)
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>> 5.       I saw already HBASE-3965 for getting
status of
> major
> > > > > > >> compaction,
> > > > > > >>>> nevertheless it has be removed from 0.92, is
it possible to
> > put
> > > it
> > > > > > >> back?
> > > > > > >>>> Even sooner than 0.92?
> > > > > > >>>>
> > > > > > >>>> 6.       In case a compaction (major) is running
it seems
> > there
> > > is
> > > > > no
> > > > > > >> way
> > > > > > >>>> to stop-it. Do you plan to add such feature?
> > > > > > >>>>
> > > > > > >>>> 7.       Do you plan to add functionality via
JMX
> > > > (starting/stopping
> > > > > > >>>> compaction, splitting....)
> > > > > > >>>>
> > > > > > >>>> 8.       Finally there were some request for
allowing custom
> > > > > > >> compaction,
> > > > > > >>>> part of this was given via the RegionObserver
in HBASE-2001,
> > > > > > >> nevertheless
> > > > > > >>>> do you consider adding support for custom compaction
> > (providing
> > > > real
> > > > > > >>>> pluggable compaction stategy not just observer)?
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>> Regards,
> > > > > > >>>> Mikael.S
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> --
> > > > > > >>> Mikael.S
> > > > > > >>>
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Mikael.S
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message