hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars George (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2165) Improve fragmentation display and implementation
Date Thu, 04 Feb 2010 06:06:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829465#action_12829465
] 

Lars George commented on HBASE-2165:
------------------------------------

>From HBASE-2181:

{quote}
Given the potentially low and misleading value of the metric, and how much effort must be
expended to collect them, I would argue at least we should allow users to disable the feature
completely.
The first problem is the data the metric delivers is not very useful. On any given busy system,
this value is often 100%. On a sample system here, 12% of the tables were at either 0 or 100%.
Furthermore the 100% metric is not particularly informative. If a table has 100% 'fragmentation'
it does not necessarily imply that this table is in dire need of compaction. The HBase compaction
code will generally keep at least 2 store files around - it refuses to minor compact older
and larger files, preferring to merge small files. Thus on a table taking writes on all regions,
the expected value of fragmentation is in fact 100%. And this is not a bad thing either. Considering
that compacting a 500GB table will take an hour and hammer a cluster, misleading users into
striving to get to 0% is non ideal.

The other major problem of this feature is collecting the data is non-trivial on larger clusters.
I did a test where I did a lsr on a hadoop cluster, and to generate 15k lines of output, it
pegged the namenode at over 100% cpu for a few seconds. On a cluster with 7000 regions, we
can clearly easily have 14,000 (2 store files per region is typical) files thus causing spikes
against the namenode to generate this statistic.

I would propose 3 courses of actions:

- allow complete disablement of the feature, including the background thread and the UI display
- change the metric to mean '# of regions with > 5 store files'
- replacing the metric with a completely different one that attempts to capture the spirit
of the intent but with less load.
{quote}

> Improve fragmentation display and implementation
> ------------------------------------------------
>
>                 Key: HBASE-2165
>                 URL: https://issues.apache.org/jira/browse/HBASE-2165
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.4, 0.21.0
>            Reporter: Lars George
>            Priority: Minor
>             Fix For: 0.20.4, 0.21.0
>
>
> Improve by 
> - moving the "blocking" FS scan into a thread so that the UI loads fast and initially
displays "n/a" but once it has completed the scan it displays the proper numbers
> - explaining what fragmentation means to the user (better hints or help text in UI)
> - Switch -ROOT- (and maybe even .META.?) to simply say "Yes" or a tick that it is fragmented
as it only has 0% or 100% available (since it has only a single region)
> - also computing the space occupied by each table and the total and - if easily done
- add a graph to display it (Google Pie Chart would be nice but is an external link)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message