hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8784) Wildcard/Range/Partition Delete Support
Date Fri, 21 Jun 2013 16:16:21 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13690437#comment-13690437

Lars Hofhansl commented on HBASE-8784:

Interesting idea. You want to store these in the HFiles' metadata block? Or somehow in .META.?
Or in a new new "DeleteRange" table?

We have a similar problem, and we decided to resort to coprocessor hooks during compactions,
where the hook reads the metadata from somewhere else and then just filters the data out by
some criteria. Obviously this only works in "delete this stuff some time when you get to it"
type scenarios.

> Wildcard/Range/Partition Delete Support
> ---------------------------------------
>                 Key: HBASE-8784
>                 URL: https://issues.apache.org/jira/browse/HBASE-8784
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client, Deletes, regionserver
>            Reporter: Lars George
> We often see use-cases where users, for example with timeseries data, would like to do
deletes of large ranges of data, basically like a delete of a partition as supported by RDBMSs.
We should support regular expressions or range expressions for the matches (supporting binary
keys obviously).
> The idea is to store the deletes not with the data, but the meta data. When we read files
we read the larger deletes first, and then the inline ones. Of course, this should be reserved
for few but very data intensive deletes. This reduces the number of deletes to write to one,
instead of many (often thousands, if not millions). This is different from the BulkDeleteEndpoint
introduced in HBASE-6942. It should support similar Scan based selectiveness. 
> The new range deletes will mask out all the matching data and handled otherwise like
other deletes, for example being dropped during major compactions, once all masked data has
been dropped too.
> To be discussed is how and where we store the delete entry in practice, since meta data
might not be wanted. But it seems like a reasonable choice. The DeleteTracker can handle the
delete the same with additional checks for wildcards/ranges. If the deletes are not used,
no critical path is affected, therefore not causing any additional latencies or other regressions.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message