hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikael Sitruk <mikael.sit...@gmail.com>
Subject Re: Major Compaction Concerns
Date Sun, 08 Jan 2012 20:55:36 GMT
Well I'm very interested to dig further. I can also tell that the number of
log is getting very high very fast and of course a flush is triggered
adding more store files. Very fast the high number of store files trigger
compaction and delay the flushing (default delay is 90000 ms).  The files
are small in size, major compaction is not needed but minor yes.
Nevertheless the code ignore the disabled automatic compaction and promotes
files to major compaction.
I think I need to play with both the log file size the compaction threshold
and the Max number of stores file. Do you have some recommendations?
Btw the compaction take about 1min 40 sec for a store size of 900MB +/-. Is
it normal?
One thing that does not help in this story is that I have 2 column families
and each RS manages 100 of regions each cf growth with differents speed.
Is there a version of hbase handling better such case (not flushing both cf
if not needed to)?

I will review the release note of the versions you suggested and open
issues/enhancements we discuss.

Thanks
Cheers.
On Jan 8, 2012 10:22 PM, "Ted Yu" <yuzhihong@gmail.com> wrote:

> Your request in first paragraph below deserves a JIRA.
>
> For 2.b I agree a bug should be filed.
>
> For major compaction, adding more logs on region server side should help
> you understand the situation better - assuming you have interest to dig
> further.
> Please upgrade to 0.90.5, or you can wait for 0.90.6 release which is
> slated for Jan. 19th.
>
> After upgrade, the logs and code would be more pertinent to the tip of 0.90
> branch.
>
> Thanks for summarizing your findings.
>
> On Sun, Jan 8, 2012 at 12:04 PM, Mikael Sitruk <mikael.sitruk@gmail.com
> >wrote:
>
> > In fact I think that for 2.a the current implementation is misleading.
> > Creating a connection and getting the configuration from the connection
> > should return the configuration of the cluster.
> > Requesting the configuration used to build an object should return the
> > configuration set on the object
> > Additionally it should be a new method like getConfigurations(), or
> > getClusterConfigurations() returning a map of serverinfo and
> > configuration.  Another option is to add on the HRegionServer and
> HMaster a
> > method getConfiguration() returning the configuration object used by the
> > RegionServer or Master
> >
> > Regarding 2.b yes I tried but it did not return the setting from the
> > cluster configuration (again server has non default configuration, table
> > was not configured with specific values then cluster configuration should
> > apply on the table object). So I see it as problematic.
> >
> > Mikael.s
> >  On Jan 8, 2012 7:54 PM, <yuzhihong@gmail.com> wrote:
> >
> > > About 2b, have you tried getting the major compaction setting from
> column
> > > descriptor ?
> > >
> > > For 2a, what you requested would result in new methods of
> > > HBaseConfiguration class to be added. Currently the configuration on
> > client
> > > class path would be used.
> > >
> > > Cheers
> > >
> > >
> > >
> > > On Jan 8, 2012, at 9:28 AM, Mikael Sitruk <mikael.sitruk@gmail.com>
> > wrote:
> > >
> > > > Ted hi
> > > > First thanks for answering, regarding the JIRA i will fill them
> > > > Second, it seems that i did not explain myself correctly regarding
> > 2.a. -
> > > > As you i do not expect that a configuration set on my client will be
> > > > propagated to the cluster, but i do expect that if i set a
> > configuration
> > > on
> > > > a server then doing connection.getConfiguration() from a client i
> will
> > > get
> > > > teh configuration from the cluster.
> > > > Currently the configuration returned is from the client config.
> > > > So the problem is that you have no way to check the configuration of
> a
> > > > cluster.
> > > > I would expect to have some API to return the cluster config and even
> > > > getting a map <serverInfo, config> so it can be easy to check cluster
> > > > problem using code.
> > > >
> > > > 2.b. I know this code, and i tried to validate it. I set in the
> server
> > > > config the "hbase.hregion.majorcompaction" to "0", then start the
> > server
> > > > (cluster). Since from the UI or from JMX this parameter is not
> visible
> > at
> > > > the cluster level, I try to get the value from the client (to see
> that
> > > the
> > > > cluster is using it)
> > > >
> > > > *HTableDescriptor hTableDescriptor =
> > > > conn.getHTableDescriptor(Bytes.toBytes("my table"));*
> > > >
> > > > *hTableDescriptor.getValue("hbase.hregion.majorcompaction")*
> > > > but i still got 24h (and not the value set in the config)! that was
> my
> > > > problem from the beginning! ==> Using the config (on the server side)
> > > will
> > > > not propagate into the table/column family
> > > >
> > > > Mikael.S
> > > >
> > > > On Sun, Jan 8, 2012 at 7:13 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> > > >
> > > >> I am not expert in major compaction feature.
> > > >> Let me try to answer questions in #2.
> > > >>
> > > >> 2.a
> > > >>> If I set the property via the configuration shouldn’t all the
> cluster
> > > be
> > > >>> aware of?
> > > >>
> > > >> There're multiple clients connecting to one cluster. I wouldn't
> expect
> > > >> values in the configuration (m_hbConfig) to propagate onto the
> > cluster.
> > > >>
> > > >> 2.b
> > > >> Store.getNextMajorCompactTime() shows that
> > > "hbase.hregion.majorcompaction"
> > > >> can be specified per column family:
> > > >>
> > > >> long getNextMajorCompactTime() {
> > > >>   // default = 24hrs
> > > >>   long ret = conf.getLong(HConstants.MAJOR_COMPACTION_PERIOD,
> > > >> 1000*60*60*24);
> > > >>   if (family.getValue(HConstants.MAJOR_COMPACTION_PERIOD) != null)
{
> > > >>
> > > >> 2.d
> > > >>> d. I tried also to setup the parameter via hbase shell but setting
> > such
> > > >>> properties is not supported. (do you plan to add such support
via
> the
> > > >>> shell?)
> > > >>
> > > >> This is a good idea. Please open a JIRA.
> > > >>
> > > >> For #5, HBASE-3965 is an improvement and doesn't have a patch yet.
> > > >>
> > > >> Allow me to quote Alan Kay: 'The best way to predict the future is
> to
> > > >> invent it.'
> > > >>
> > > >> Once we have a patch, we can always backport it to 0.92 after some
> > > people
> > > >> have verified the improvement.
> > > >>
> > > >>> 6.       In case a compaction (major) is running it seems there
is
> no
> > > way
> > > >>> to stop-it. Do you plan to add such feature?
> > > >>
> > > >> Again, logging a JIRA would provide a good starting point for
> > > discussion.
> > > >>
> > > >> Thanks for the verification work and suggestions, Mikael.
> > > >>
> > > >> On Sun, Jan 8, 2012 at 7:27 AM, Mikael Sitruk <
> > mikael.sitruk@gmail.com
> > > >>> wrote:
> > > >>
> > > >>> I forgot to mention, I'm using HBase 0.90.1
> > > >>>
> > > >>> Regards,
> > > >>> Mikael.S
> > > >>>
> > > >>> On Sun, Jan 8, 2012 at 5:25 PM, Mikael Sitruk <
> > mikael.sitruk@gmail.com
> > > >>>> wrote:
> > > >>>
> > > >>>> Hi
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> I have some concern regarding major compactions below...
> > > >>>>
> > > >>>>
> > > >>>>   1. According to best practices from the mailing list and
from
> the
> > > >>>>   book, automatic major compaction should be disabled. This
can be
> > > >> done
> > > >>> by
> > > >>>>   setting the property ‘hbase.hregion.majorcompaction’
to ‘0’.
> > > >>> Neverhteless
> > > >>>>   even after having doing this I STILL see “major compaction”
> > messages
> > > >>> in
> > > >>>>   logs. therefore it is unclear how can I manage major
> compactions.
> > > >> (The
> > > >>>>   system has heavy insert - uniformly on the cluster, and
major
> > > >>> compaction
> > > >>>>   affect the performance of the system).
> > > >>>>   If I'm not wrong it seems from the code that: even if not
> > requested
> > > >>>>   and even if the indicator is set to '0' (no automatic major
> > > >>> compaction),
> > > >>>>   major compaction can be triggered by the code in case all
store
> > > >> files
> > > >>> are
> > > >>>>   candidate for a compaction (from Store.compact(final boolean
> > > >>> forceMajor)).
> > > >>>>   Shouldn't the code add a condition that automatic major
> compaction
> > > >> is
> > > >>>>   disabled??
> > > >>>>
> > > >>>>   2. I tried to check the parameter
>  ‘hbase.hregion.majorcompaction’
> > > >> at
> > > >>>>   runtime using several approaches - to validate that the
server
> > > >> indeed
> > > >>>>   loaded the parameter.
> > > >>>>
> > > >>>> a. Using a connection created from local config
> > > >>>>
> > > >>>> *conn = (HConnection)
> HConnectionManager.getConnection(m_hbConfig);*
> > > >>>>
> > > >>>>
> *conn.getConfiguration().getString(“hbase.hregion.majorcompaction”)*
> > > >>>>
> > > >>>> returns the parameter from local config and not from cluster.
Is
> it
> > a
> > > >>> bug?
> > > >>>> If I set the property via the configuration shouldn’t all
the
> > cluster
> > > >> be
> > > >>>> aware of? (supposing that the connection indeed connected
to the
> > > >> cluster)
> > > >>>>
> > > >>>> b.  fetching the property from the table descriptor
> > > >>>>
> > > >>>> *HTableDescriptor hTableDescriptor =
> > > >>>> conn.getHTableDescriptor(Bytes.toBytes("my table"));*
> > > >>>>
> > > >>>> *hTableDescriptor.getValue("hbase.hregion.majorcompaction")*
> > > >>>>
> > > >>>> This will returns the default parameter value (1 day) not
the
> > > parameter
> > > >>>> from the configuration (on the cluster). It seems to be a
bug,
> isn’t
> > > >> it?
> > > >>>> (the parameter from the config, should be the default if not
set
> at
> > > the
> > > >>>> table level)
> > > >>>>
> > > >>>> c. The only way I could set the parameter to 0 and really
see it
> is
> > > via
> > > >>>> the Admin API, updating the table descriptor or the column
> > descriptor.
> > > >>> Now
> > > >>>> I could see the parameter on the web UI. So is it the only
way to
> > set
> > > >>>> correctly the parameter? If setting the parameter via the
> > > configuration
> > > >>>> file, shouldn’t the webUI show this on any table created?
> > > >>>>
> > > >>>> d. I tried also to setup the parameter via hbase shell but
setting
> > > such
> > > >>>> properties is not supported. (do you plan to add such support
via
> > the
> > > >>>> shell?)
> > > >>>>
> > > >>>> e. Generally is it possible to get via API the configuration
used
> by
> > > >> the
> > > >>>> servers? (at cluster/server level)
> > > >>>>
> > > >>>>    3.  I ran both major compaction  requests from the shell
or
> from
> > > >> API
> > > >>>> but since both are async there is no progress indication.
Neither
> > the
> > > >> JMX
> > > >>>> nor the Web will help here since you don’t know if a compaction
> task
> > > is
> > > >>>> running. Tailling the logs is not an efficient way to do this
> > neither.
> > > >>> The
> > > >>>> point is that I would like to automate the process and avoid
> > > compaction
> > > >>>> storm. So I want to do that region, region, but if I don’t
know
> > when a
> > > >>>> compaction started/ended I can’t automate it.
> > > >>>>
> > > >>>> 4.       In case there is no compaction files in queue (but
still
> > you
> > > >>> have
> > > >>>> more than 1 storefile per store e.g. minor compaction just
> finished)
> > > >> then
> > > >>>> invoking major_compact will indeed decrease the number of
store
> > files,
> > > >>> but
> > > >>>> the compaction queue will remain to 0 during the compaction
task
> > > >>> (shouldn’t
> > > >>>> the compaction queue increase by the number of file to compact
and
> > be
> > > >>>> reduced when the task ended?)
> > > >>>>
> > > >>>>
> > > >>>> 5.       I saw already HBASE-3965 for getting status of major
> > > >> compaction,
> > > >>>> nevertheless it has be removed from 0.92, is it possible to
put it
> > > >> back?
> > > >>>> Even sooner than 0.92?
> > > >>>>
> > > >>>> 6.       In case a compaction (major) is running it seems
there is
> > no
> > > >> way
> > > >>>> to stop-it. Do you plan to add such feature?
> > > >>>>
> > > >>>> 7.       Do you plan to add functionality via JMX
> (starting/stopping
> > > >>>> compaction, splitting....)
> > > >>>>
> > > >>>> 8.       Finally there were some request for allowing custom
> > > >> compaction,
> > > >>>> part of this was given via the RegionObserver in HBASE-2001,
> > > >> nevertheless
> > > >>>> do you consider adding support for custom compaction (providing
> real
> > > >>>> pluggable compaction stategy not just observer)?
> > > >>>>
> > > >>>>
> > > >>>> Regards,
> > > >>>> Mikael.S
> > > >>>>
> > > >>>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Mikael.S
> > > >>>
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Mikael.S
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message