hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars George (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-14864) Add support for bucketing of keys into client library
Date Fri, 20 Nov 2015 20:54:11 GMT

     [ https://issues.apache.org/jira/browse/HBASE-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lars George updated HBASE-14864:
--------------------------------
    Description: 
This has been discussed and taught so many times, I believe it is time to support it properly.
The idea is to be able to assign an optional _bucketing_ strategy to a table, which translates
the user given row keys into a bucketed version. This is done by either simple count, or by
parts of the key. Possibly some simple functionality should help _compute_ bucket keys. 

For example, given a key {{<service>\-<epoch>\-<subgroup>-...}} you could
imagine that a rule can be defined that takes the _epoch_ part and chunks it into, for example,
5 minute buckets. This allows to store small time series together and make reading (especially
over many servers) much more efficient.

The client also supports the proper scan logic to fan a scan over the buckets as needed. There
may be an executor service (implicitly or explicitly provided) that is used to fetch the original
data with user visible ordering from the distributed buckets. 

Note that this has been attempted a few times to various extends out in the field, but then
withered away. This is an essential feature that when present in the API will make users consider
this earlier, instead of when it is too late (when hot spotting occurs for example).

The selected bucketing strategy and settings could be stored in the table descriptor key/value
pairs. This will allow any client to observe the strategy transparently. If not set the behaviour
is the same as today, so the new feature is not touching any critical path in terms of code,
and is fully client side. (But could be considered for say UI support as well - if needed).

The strategies are pluggable using classes, but a few default implementations are supplied.

  was:
This has been discussed and taught so many times, I believe it is time to support it properly.
The idea is to be able to assign an optional _bucketing_ strategy to a table, which translates
the user given row keys into a bucketed version. This is done by either simple count, or by
parts of the key. Possibly some simple functionality should help _compute_ bucket keys. 

For example, given a key {{<service>-<epoch>-<subgroup>-...}} you could
imagine that a rule can be defined that takes the _epoch_ part and chunks it into, for example,
5 minute buckets. This allows to store small time series together and make reading (especially
over many servers) much more efficient.

The client also supports the proper scan logic to fan a scan over the buckets as needed. There
may be an executor service (implicitly or explicitly provided) that is used to fetch the original
data with user visible ordering from the distributed buckets. 

Note that this has been attempted a few times to various extends out in the field, but then
withered away. This is an essential feature that when present in the API will make users consider
this earlier, instead of when it is too late (when hot spotting occurs for example).

The selected bucketing strategy and settings could be stored in the table descriptor key/value
pairs. This will allow any client to observe the strategy transparently. If not set the behaviour
is the same as today, so the new feature is not touching any critical path in terms of code,
and is fully client side. (But could be considered for say UI support as well - if needed).

The strategies are pluggable using classes, but a few default implementations are supplied.


> Add support for bucketing of keys into client library
> -----------------------------------------------------
>
>                 Key: HBASE-14864
>                 URL: https://issues.apache.org/jira/browse/HBASE-14864
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client
>            Reporter: Lars George
>
> This has been discussed and taught so many times, I believe it is time to support it
properly. The idea is to be able to assign an optional _bucketing_ strategy to a table, which
translates the user given row keys into a bucketed version. This is done by either simple
count, or by parts of the key. Possibly some simple functionality should help _compute_ bucket
keys. 
> For example, given a key {{<service>\-<epoch>\-<subgroup>-...}} you
could imagine that a rule can be defined that takes the _epoch_ part and chunks it into, for
example, 5 minute buckets. This allows to store small time series together and make reading
(especially over many servers) much more efficient.
> The client also supports the proper scan logic to fan a scan over the buckets as needed.
There may be an executor service (implicitly or explicitly provided) that is used to fetch
the original data with user visible ordering from the distributed buckets. 
> Note that this has been attempted a few times to various extends out in the field, but
then withered away. This is an essential feature that when present in the API will make users
consider this earlier, instead of when it is too late (when hot spotting occurs for example).
> The selected bucketing strategy and settings could be stored in the table descriptor
key/value pairs. This will allow any client to observe the strategy transparently. If not
set the behaviour is the same as today, so the new feature is not touching any critical path
in terms of code, and is fully client side. (But could be considered for say UI support as
well - if needed).
> The strategies are pluggable using classes, but a few default implementations are supplied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message