hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Major Compaction Concerns
Date Sun, 08 Jan 2012 21:07:35 GMT
HBASE-3051, Compaction at the granularity of a column-family, is marked
implemented by HBASE-3796
<https://issues.apache.org/jira/browse/HBASE-3796>which is in 0.92
(0.92 RC3 is coming out soon)

Please see http://hbase.apache.org/book/regions.arch.html, 8.7.5.5 which
refers to
http://hbase.apache.org/book/important_configurations.html#managed.compactions

Cheers

On Sun, Jan 8, 2012 at 12:55 PM, Mikael Sitruk <mikael.sitruk@gmail.com>wrote:

> Well I'm very interested to dig further. I can also tell that the number of
> log is getting very high very fast and of course a flush is triggered
> adding more store files. Very fast the high number of store files trigger
> compaction and delay the flushing (default delay is 90000 ms).  The files
> are small in size, major compaction is not needed but minor yes.
> Nevertheless the code ignore the disabled automatic compaction and promotes
> files to major compaction.
> I think I need to play with both the log file size the compaction threshold
> and the Max number of stores file. Do you have some recommendations?
> Btw the compaction take about 1min 40 sec for a store size of 900MB +/-. Is
> it normal?
> One thing that does not help in this story is that I have 2 column families
> and each RS manages 100 of regions each cf growth with differents speed.
> Is there a version of hbase handling better such case (not flushing both cf
> if not needed to)?
>
> I will review the release note of the versions you suggested and open
> issues/enhancements we discuss.
>
> Thanks
> Cheers.
> On Jan 8, 2012 10:22 PM, "Ted Yu" <yuzhihong@gmail.com> wrote:
>
> > Your request in first paragraph below deserves a JIRA.
> >
> > For 2.b I agree a bug should be filed.
> >
> > For major compaction, adding more logs on region server side should help
> > you understand the situation better - assuming you have interest to dig
> > further.
> > Please upgrade to 0.90.5, or you can wait for 0.90.6 release which is
> > slated for Jan. 19th.
> >
> > After upgrade, the logs and code would be more pertinent to the tip of
> 0.90
> > branch.
> >
> > Thanks for summarizing your findings.
> >
> > On Sun, Jan 8, 2012 at 12:04 PM, Mikael Sitruk <mikael.sitruk@gmail.com
> > >wrote:
> >
> > > In fact I think that for 2.a the current implementation is misleading.
> > > Creating a connection and getting the configuration from the connection
> > > should return the configuration of the cluster.
> > > Requesting the configuration used to build an object should return the
> > > configuration set on the object
> > > Additionally it should be a new method like getConfigurations(), or
> > > getClusterConfigurations() returning a map of serverinfo and
> > > configuration.  Another option is to add on the HRegionServer and
> > HMaster a
> > > method getConfiguration() returning the configuration object used by
> the
> > > RegionServer or Master
> > >
> > > Regarding 2.b yes I tried but it did not return the setting from the
> > > cluster configuration (again server has non default configuration,
> table
> > > was not configured with specific values then cluster configuration
> should
> > > apply on the table object). So I see it as problematic.
> > >
> > > Mikael.s
> > >  On Jan 8, 2012 7:54 PM, <yuzhihong@gmail.com> wrote:
> > >
> > > > About 2b, have you tried getting the major compaction setting from
> > column
> > > > descriptor ?
> > > >
> > > > For 2a, what you requested would result in new methods of
> > > > HBaseConfiguration class to be added. Currently the configuration on
> > > client
> > > > class path would be used.
> > > >
> > > > Cheers
> > > >
> > > >
> > > >
> > > > On Jan 8, 2012, at 9:28 AM, Mikael Sitruk <mikael.sitruk@gmail.com>
> > > wrote:
> > > >
> > > > > Ted hi
> > > > > First thanks for answering, regarding the JIRA i will fill them
> > > > > Second, it seems that i did not explain myself correctly regarding
> > > 2.a. -
> > > > > As you i do not expect that a configuration set on my client will
> be
> > > > > propagated to the cluster, but i do expect that if i set a
> > > configuration
> > > > on
> > > > > a server then doing connection.getConfiguration() from a client i
> > will
> > > > get
> > > > > teh configuration from the cluster.
> > > > > Currently the configuration returned is from the client config.
> > > > > So the problem is that you have no way to check the configuration
> of
> > a
> > > > > cluster.
> > > > > I would expect to have some API to return the cluster config and
> even
> > > > > getting a map <serverInfo, config> so it can be easy to check
> cluster
> > > > > problem using code.
> > > > >
> > > > > 2.b. I know this code, and i tried to validate it. I set in the
> > server
> > > > > config the "hbase.hregion.majorcompaction" to "0", then start the
> > > server
> > > > > (cluster). Since from the UI or from JMX this parameter is not
> > visible
> > > at
> > > > > the cluster level, I try to get the value from the client (to see
> > that
> > > > the
> > > > > cluster is using it)
> > > > >
> > > > > *HTableDescriptor hTableDescriptor =
> > > > > conn.getHTableDescriptor(Bytes.toBytes("my table"));*
> > > > >
> > > > > *hTableDescriptor.getValue("hbase.hregion.majorcompaction")*
> > > > > but i still got 24h (and not the value set in the config)! that was
> > my
> > > > > problem from the beginning! ==> Using the config (on the server
> side)
> > > > will
> > > > > not propagate into the table/column family
> > > > >
> > > > > Mikael.S
> > > > >
> > > > > On Sun, Jan 8, 2012 at 7:13 PM, Ted Yu <yuzhihong@gmail.com>
> wrote:
> > > > >
> > > > >> I am not expert in major compaction feature.
> > > > >> Let me try to answer questions in #2.
> > > > >>
> > > > >> 2.a
> > > > >>> If I set the property via the configuration shouldn’t all
the
> > cluster
> > > > be
> > > > >>> aware of?
> > > > >>
> > > > >> There're multiple clients connecting to one cluster. I wouldn't
> > expect
> > > > >> values in the configuration (m_hbConfig) to propagate onto the
> > > cluster.
> > > > >>
> > > > >> 2.b
> > > > >> Store.getNextMajorCompactTime() shows that
> > > > "hbase.hregion.majorcompaction"
> > > > >> can be specified per column family:
> > > > >>
> > > > >> long getNextMajorCompactTime() {
> > > > >>   // default = 24hrs
> > > > >>   long ret = conf.getLong(HConstants.MAJOR_COMPACTION_PERIOD,
> > > > >> 1000*60*60*24);
> > > > >>   if (family.getValue(HConstants.MAJOR_COMPACTION_PERIOD) !=
> null) {
> > > > >>
> > > > >> 2.d
> > > > >>> d. I tried also to setup the parameter via hbase shell but
> setting
> > > such
> > > > >>> properties is not supported. (do you plan to add such support
via
> > the
> > > > >>> shell?)
> > > > >>
> > > > >> This is a good idea. Please open a JIRA.
> > > > >>
> > > > >> For #5, HBASE-3965 is an improvement and doesn't have a patch
yet.
> > > > >>
> > > > >> Allow me to quote Alan Kay: 'The best way to predict the future
is
> > to
> > > > >> invent it.'
> > > > >>
> > > > >> Once we have a patch, we can always backport it to 0.92 after
some
> > > > people
> > > > >> have verified the improvement.
> > > > >>
> > > > >>> 6.       In case a compaction (major) is running it seems
there
> is
> > no
> > > > way
> > > > >>> to stop-it. Do you plan to add such feature?
> > > > >>
> > > > >> Again, logging a JIRA would provide a good starting point for
> > > > discussion.
> > > > >>
> > > > >> Thanks for the verification work and suggestions, Mikael.
> > > > >>
> > > > >> On Sun, Jan 8, 2012 at 7:27 AM, Mikael Sitruk <
> > > mikael.sitruk@gmail.com
> > > > >>> wrote:
> > > > >>
> > > > >>> I forgot to mention, I'm using HBase 0.90.1
> > > > >>>
> > > > >>> Regards,
> > > > >>> Mikael.S
> > > > >>>
> > > > >>> On Sun, Jan 8, 2012 at 5:25 PM, Mikael Sitruk <
> > > mikael.sitruk@gmail.com
> > > > >>>> wrote:
> > > > >>>
> > > > >>>> Hi
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> I have some concern regarding major compactions below...
> > > > >>>>
> > > > >>>>
> > > > >>>>   1. According to best practices from the mailing list
and from
> > the
> > > > >>>>   book, automatic major compaction should be disabled.
This can
> be
> > > > >> done
> > > > >>> by
> > > > >>>>   setting the property ‘hbase.hregion.majorcompaction’
to ‘0’.
> > > > >>> Neverhteless
> > > > >>>>   even after having doing this I STILL see “major compaction”
> > > messages
> > > > >>> in
> > > > >>>>   logs. therefore it is unclear how can I manage major
> > compactions.
> > > > >> (The
> > > > >>>>   system has heavy insert - uniformly on the cluster,
and major
> > > > >>> compaction
> > > > >>>>   affect the performance of the system).
> > > > >>>>   If I'm not wrong it seems from the code that: even
if not
> > > requested
> > > > >>>>   and even if the indicator is set to '0' (no automatic
major
> > > > >>> compaction),
> > > > >>>>   major compaction can be triggered by the code in case
all
> store
> > > > >> files
> > > > >>> are
> > > > >>>>   candidate for a compaction (from Store.compact(final
boolean
> > > > >>> forceMajor)).
> > > > >>>>   Shouldn't the code add a condition that automatic major
> > compaction
> > > > >> is
> > > > >>>>   disabled??
> > > > >>>>
> > > > >>>>   2. I tried to check the parameter
> >  ‘hbase.hregion.majorcompaction’
> > > > >> at
> > > > >>>>   runtime using several approaches - to validate that
the server
> > > > >> indeed
> > > > >>>>   loaded the parameter.
> > > > >>>>
> > > > >>>> a. Using a connection created from local config
> > > > >>>>
> > > > >>>> *conn = (HConnection)
> > HConnectionManager.getConnection(m_hbConfig);*
> > > > >>>>
> > > > >>>>
> > *conn.getConfiguration().getString(“hbase.hregion.majorcompaction”)*
> > > > >>>>
> > > > >>>> returns the parameter from local config and not from
cluster. Is
> > it
> > > a
> > > > >>> bug?
> > > > >>>> If I set the property via the configuration shouldn’t
all the
> > > cluster
> > > > >> be
> > > > >>>> aware of? (supposing that the connection indeed connected
to the
> > > > >> cluster)
> > > > >>>>
> > > > >>>> b.  fetching the property from the table descriptor
> > > > >>>>
> > > > >>>> *HTableDescriptor hTableDescriptor =
> > > > >>>> conn.getHTableDescriptor(Bytes.toBytes("my table"));*
> > > > >>>>
> > > > >>>> *hTableDescriptor.getValue("hbase.hregion.majorcompaction")*
> > > > >>>>
> > > > >>>> This will returns the default parameter value (1 day)
not the
> > > > parameter
> > > > >>>> from the configuration (on the cluster). It seems to
be a bug,
> > isn’t
> > > > >> it?
> > > > >>>> (the parameter from the config, should be the default
if not set
> > at
> > > > the
> > > > >>>> table level)
> > > > >>>>
> > > > >>>> c. The only way I could set the parameter to 0 and really
see it
> > is
> > > > via
> > > > >>>> the Admin API, updating the table descriptor or the column
> > > descriptor.
> > > > >>> Now
> > > > >>>> I could see the parameter on the web UI. So is it the
only way
> to
> > > set
> > > > >>>> correctly the parameter? If setting the parameter via
the
> > > > configuration
> > > > >>>> file, shouldn’t the webUI show this on any table created?
> > > > >>>>
> > > > >>>> d. I tried also to setup the parameter via hbase shell
but
> setting
> > > > such
> > > > >>>> properties is not supported. (do you plan to add such
support
> via
> > > the
> > > > >>>> shell?)
> > > > >>>>
> > > > >>>> e. Generally is it possible to get via API the configuration
> used
> > by
> > > > >> the
> > > > >>>> servers? (at cluster/server level)
> > > > >>>>
> > > > >>>>    3.  I ran both major compaction  requests from the
shell or
> > from
> > > > >> API
> > > > >>>> but since both are async there is no progress indication.
> Neither
> > > the
> > > > >> JMX
> > > > >>>> nor the Web will help here since you don’t know if
a compaction
> > task
> > > > is
> > > > >>>> running. Tailling the logs is not an efficient way to
do this
> > > neither.
> > > > >>> The
> > > > >>>> point is that I would like to automate the process and
avoid
> > > > compaction
> > > > >>>> storm. So I want to do that region, region, but if I
don’t know
> > > when a
> > > > >>>> compaction started/ended I can’t automate it.
> > > > >>>>
> > > > >>>> 4.       In case there is no compaction files in queue
(but
> still
> > > you
> > > > >>> have
> > > > >>>> more than 1 storefile per store e.g. minor compaction
just
> > finished)
> > > > >> then
> > > > >>>> invoking major_compact will indeed decrease the number
of store
> > > files,
> > > > >>> but
> > > > >>>> the compaction queue will remain to 0 during the compaction
task
> > > > >>> (shouldn’t
> > > > >>>> the compaction queue increase by the number of file to
compact
> and
> > > be
> > > > >>>> reduced when the task ended?)
> > > > >>>>
> > > > >>>>
> > > > >>>> 5.       I saw already HBASE-3965 for getting status
of major
> > > > >> compaction,
> > > > >>>> nevertheless it has be removed from 0.92, is it possible
to put
> it
> > > > >> back?
> > > > >>>> Even sooner than 0.92?
> > > > >>>>
> > > > >>>> 6.       In case a compaction (major) is running it seems
there
> is
> > > no
> > > > >> way
> > > > >>>> to stop-it. Do you plan to add such feature?
> > > > >>>>
> > > > >>>> 7.       Do you plan to add functionality via JMX
> > (starting/stopping
> > > > >>>> compaction, splitting....)
> > > > >>>>
> > > > >>>> 8.       Finally there were some request for allowing
custom
> > > > >> compaction,
> > > > >>>> part of this was given via the RegionObserver in HBASE-2001,
> > > > >> nevertheless
> > > > >>>> do you consider adding support for custom compaction
(providing
> > real
> > > > >>>> pluggable compaction stategy not just observer)?
> > > > >>>>
> > > > >>>>
> > > > >>>> Regards,
> > > > >>>> Mikael.S
> > > > >>>>
> > > > >>>>
> > > > >>>
> > > > >>>
> > > > >>> --
> > > > >>> Mikael.S
> > > > >>>
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Mikael.S
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message