hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12311) Version stats in HFiles?
Date Wed, 22 Oct 2014 15:39:34 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180065#comment-14180065
] 

Andrew Purtell commented on HBASE-12311:
----------------------------------------

Well we had HBASE-7958 but it fizzled out. One issue seemed to be maintaining a stats table
duplicates metrics reporting and metrics aggregation/history that will already be in place
externally (?). So I proposed https://issues.apache.org/jira/browse/HBASE-7958?focusedCommentId=13997314&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13997314
not surfacing stats calculated when processing HFiles into a system table but instead keep
them as internal metadata that HFile/HStore could get at. The proposal was "maintain a tree
of statistic files in HDFS" but this information could be embedded in HFiles themselves. The
information there could also be exported to the metrics subsystem. Should we revive that issue?
Although per block HFile statistics is something new I think. 

> Version stats in HFiles?
> ------------------------
>
>                 Key: HBASE-12311
>                 URL: https://issues.apache.org/jira/browse/HBASE-12311
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Lars Hofhansl
>
> In HBASE-9778 I basically punted the decision on whether doing repeated scanner.next()
called instead of the issueing (re)seeks to the user.
> I think we can do better.
> One way do that is maintain simple stats of what the maximum number of versions we've
seen for any row/col combination and store these in the HFile's metadata (just like the timerange,
oldest Put, etc).
> Then we estimate fairly accurately whether we have to expect lots of versions (i.e. seek
between columns is better) or not (in which case we'd issue repeated next()'s).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message