hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manukranth Kolloju (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-9815) Add Histogram representative of row key distribution inside a region.
Date Tue, 22 Oct 2013 16:13:43 GMT

     [ https://issues.apache.org/jira/browse/HBASE-9815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Manukranth Kolloju updated HBASE-9815:
--------------------------------------

    Description: 
Using histogram information, users can parallelize the scan workload into equal sized scans
based on the estimated size from the Histogram information. This will help in enabling systems
which are trying to perform queries on top of HBase to do cost based optimization while scanning.
The Idea is to keep this histogram information in the HFile in the trailer and populate this
on compaction and flush. 

The HRegionInterface can expose an API to return the Histogram information of a region, which
can be generated by merging histograms of all the hfiles.

Implementing the histogram on the basis of 
http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf
http://dl.acm.org/citation.cfm?id=1951376
and NumericHistogram from hive.

  was:
Using histogram information, users can parallelize the scan workload into equal sized scans
based on the estimated size from the Histogram information. This will help in enabling systems
which are trying to perform queries on top of HBase to do cost based optimization while scanning.
The Idea is to keep this histogram information into the HFile in the trailer and populate
this on compaction and/or flush. 

The HRegionInterface can expose an API to return the Histogram information of a region, which
can be generated by merging histograms of all the hfiles.

Implementing the histogram on the basis of 
http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf
http://dl.acm.org/citation.cfm?id=1951376
and NumericHistogram from hive.


> Add Histogram representative of row key distribution inside a region.
> ---------------------------------------------------------------------
>
>                 Key: HBASE-9815
>                 URL: https://issues.apache.org/jira/browse/HBASE-9815
>             Project: HBase
>          Issue Type: New Feature
>          Components: HFile
>    Affects Versions: 0.89-fb
>            Reporter: Manukranth Kolloju
>            Assignee: Manukranth Kolloju
>             Fix For: 0.89-fb
>
>
> Using histogram information, users can parallelize the scan workload into equal sized
scans based on the estimated size from the Histogram information. This will help in enabling
systems which are trying to perform queries on top of HBase to do cost based optimization
while scanning. The Idea is to keep this histogram information in the HFile in the trailer
and populate this on compaction and flush. 
> The HRegionInterface can expose an API to return the Histogram information of a region,
which can be generated by merging histograms of all the hfiles.
> Implementing the histogram on the basis of 
> http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf
> http://dl.acm.org/citation.cfm?id=1951376
> and NumericHistogram from hive.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message