hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liyin Tang (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-5032) Add other DELETE or DELETE into the delete bloom filter
Date Wed, 28 Dec 2011 23:55:30 GMT

     [ https://issues.apache.org/jira/browse/HBASE-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Liyin Tang updated HBASE-5032:
------------------------------

    Description: 
To speed up time range scans we need to seek to the maximum timestamp of the requested range,instead
of going to the first KV of the (row, column) pair and iterating from there. If we don't know
the (row, column), e.g. if it is not specified in the query, we need to go to end of the current
row/column pair first, get a KV from there, and do another seek to (row', column', timerange_max)
from there. We can only skip over to the timerange_max timestamp when we know that there are
no DeleteColumn records at the top of that row/column with a higher timestamp. We can utilize
another Bloom filter keyed on (row, column) to quickly find that out. (From HBASE-4962)

So the motivation is to save seek ops for scanning time-range queries if we know there is
no delete for this row/column. 

>From the implementation prospective, we have already have a delete family bloom filter
which contains all the 







  was:
Previously, the delete family bloom filter only contains the row key which has the delete
family. It helps us to avoid the top-row seek operation.

This jira attempts to add the delete column into this delete bloom filter as well (rename
the delete family bloom filter as delete bloom filter).

The motivation is to save seek ops for scan time-range queries if we know there is no delete
column for this row/column. 
We can seek directly to the exact timestamp we are interested in, instead of seeking to the
latest timestamp and keeping skipping to find out whether there is any delete column before
the interested timestamp.



        Summary: Add other DELETE or DELETE  into the delete bloom filter  (was: Add DELETE
COLUMN into the delete bloom filter)
    
> Add other DELETE or DELETE  into the delete bloom filter
> --------------------------------------------------------
>
>                 Key: HBASE-5032
>                 URL: https://issues.apache.org/jira/browse/HBASE-5032
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>
> To speed up time range scans we need to seek to the maximum timestamp of the requested
range,instead of going to the first KV of the (row, column) pair and iterating from there.
If we don't know the (row, column), e.g. if it is not specified in the query, we need to go
to end of the current row/column pair first, get a KV from there, and do another seek to (row',
column', timerange_max) from there. We can only skip over to the timerange_max timestamp when
we know that there are no DeleteColumn records at the top of that row/column with a higher
timestamp. We can utilize another Bloom filter keyed on (row, column) to quickly find that
out. (From HBASE-4962)
> So the motivation is to save seek ops for scanning time-range queries if we know there
is no delete for this row/column. 
> From the implementation prospective, we have already have a delete family bloom filter
which contains all the 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message