hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Meil (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4147) StoreFile query usage report
Date Fri, 29 Jul 2011 13:23:09 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072807#comment-13072807
] 

Doug Meil commented on HBASE-4147:
----------------------------------

I think that instrumenting StoreFileScanner by gathering time spent for all the 'next' and
'seek' calls would do it.  And then on 'close' it would publish the detailed record to some
internal service that would gather up all the these detail records and then periodically dump
the summary. 

I'm doing some hand-waving here because we don't want to introduce concurrency issues in the
publishing process (e.g., publishing to something that is synchronized will effectively single-thread
StoreFileScanners which would be a non-starter), but based on my understanding of the code
it seems like this would be a fairly targeted change.  

Thoughts?  

> StoreFile query usage report
> ----------------------------
>
>                 Key: HBASE-4147
>                 URL: https://issues.apache.org/jira/browse/HBASE-4147
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Doug Meil
>            Priority: Minor
>
> Detailed information on what HBase is doing in terms of reads is hard to come by.
> What would be useful is to have a periodic StoreFile query report.  Specifically, this
could run on a configured interval (e.g., every 30 seconds, 60 seconds) and dump the output
to the log files.
> This would have all StoreFiles accessed during the reporting period (and with the Path
we would also know region, CF, and table), # of times the StoreFile was accessed, the size
of the StoreFile, and the total time (ms) spent processing that StoreFile.
> Even this level of summary would be useful to detect a which tables & CFs are being
accessed the most, and including the StoreFile would provide insight into relative "uncompaction"
(i.e., lots of StoreFiles).
> I think the log-output, as opposed to UI, is an important facet with this.  I'm assuming
that users will slice and dice this data on their own so I think we should skip any kind of
admin view for now (i.e., new JSPs, new APIs to expose this data).  Just getting this to log-file
would be a big improvement.
> Will this have a non-zero performance impact?  Yes.  Hopefully small, but yes it will.
 However, flying a plane without any instrumentation isn't fun.  :-)  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message