hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Meil (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4147) StoreFile query usage report
Date Wed, 10 Aug 2011 19:16:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082580#comment-13082580

Doug Meil commented on HBASE-4147:

For those not familiar with Statspack, see this:  http://www.akadia.com/services/ora_statspack_survival_guide.html

re:  "STATSPACK is a diagnosis tool for instance-wide performance problems; it also supports
application tuning activities by providing data which identifies high-load SQL statements.
STATSPACK can be used both proactively to monitor the changing load on a system, and also
reactively to investigate a performance problem."

re: "The STATSPACK reports we like are from 1 5-minute intervals during a busy or peak time,
when the performance is at its worst."
That's exactly what I'm talking about...  small intervals.  Not too small, but but not too

re: "Another common mistake with STATSPACK is to gather snapshots only when there is a problem."
That's why this type of reporting should pretty much be 'always on' - you need to be able
to compare to other points in time.

Again, some things don't translate 1:1 from the RDBMS world, but a lot does.  

> StoreFile query usage report
> ----------------------------
>                 Key: HBASE-4147
>                 URL: https://issues.apache.org/jira/browse/HBASE-4147
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Priority: Minor
>         Attachments: hbase_4147_storefilereport.pdf, hbase_4147_storefilereport_2011_08_10.pdf
> Detailed information on what HBase is doing in terms of reads is hard to come by.
> What would be useful is to have a periodic StoreFile query report.  Specifically, this
could run on a configured interval (e.g., every 30 seconds, 60 seconds) and dump the output
to the log files.
> This would have all StoreFiles accessed during the reporting period (and with the Path
we would also know region, CF, and table), # of times the StoreFile was accessed, the size
of the StoreFile, and the total time (ms) spent processing that StoreFile.
> Even this level of summary would be useful to detect a which tables & CFs are being
accessed the most, and including the StoreFile would provide insight into relative "uncompaction"
(i.e., lots of StoreFiles).
> I think the log-output, as opposed to UI, is an important facet with this.  I'm assuming
that users will slice and dice this data on their own so I think we should skip any kind of
admin view for now (i.e., new JSPs, new APIs to expose this data).  Just getting this to log-file
would be a big improvement.
> Will this have a non-zero performance impact?  Yes.  Hopefully small, but yes it will.
 However, flying a plane without any instrumentation isn't fun.  :-)  

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message