hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17756) We should have better introspection of HFiles
Date Thu, 09 Mar 2017 05:40:38 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902528#comment-15902528

stack commented on HBASE-17756:

What stats we want on an hfile?

+ Rough count on each key instance?
+ Similar for key/value sizes?
+ Versions of Cells in an hfile (HBASE-12311 Version stats in HFiles?)
+ HBASE-7958 talked of row key distribution, cardinality as well as column family/column qualifier
cardinality as well as a bunch of other possibles.

Later we could merge up hfile content to make a region stat... (

> We should have better introspection of HFiles
> ---------------------------------------------
>                 Key: HBASE-17756
>                 URL: https://issues.apache.org/jira/browse/HBASE-17756
>             Project: HBase
>          Issue Type: Brainstorming
>          Components: HFile
>            Reporter: Esteban Gutierrez
> [~saint.ack@gmail.com] was suggesting to use DataSketches (https://datasketches.github.io)
in order to write additional statistics to the HFiles. This could be used to improve our split
decisions, troubleshooting or potentially do other interesting analysis without having to
perform full table scans. The statistics could be stored as part of the HFile but we could
initially improve the visibility of the data by adding some statistics to HFilePrettyPrinter.

This message was sent by Atlassian JIRA

View raw message