hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3162) Add TimeRange support into Increment to optimize for counters that are partitioned on time
Date Thu, 28 Oct 2010 06:22:04 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925706#action_12925706

Todd Lipcon commented on HBASE-3162:

Another way of attacking this problem is a bit more general - something I've been thinking
about for a while but don't think I ever posted.

Right now we have the ability to do bloom filters on store files to decide whether a key exists
in the file. It would be useful to add the ability to do a bloom filter on a _function_ of
the key. In other words, right now, we check:
if (bloomFilter.mightContain(key)) { look in file }
but instead we could check:
if (bloomFilter.mightContain(function(key))) { look in file }
so that the current implementation is just the special case where the function is the identity

Getting back to the JIRA at hand, the idea is the following: if you are sharding your counters
by time, then the key would contain some time information. EG you might have the counter pageid_1234_20101027_hits
to track page views for a given day. With current blooms we'd end up with a lot of bits in
the bloom filter to have a good false positive rate, but if instead the blooms were on just
the "20101027" portion of the key, there would be very few unique values and thus we can get
near 100% hit rate with very little overhead.


> Add TimeRange support into Increment to optimize for counters that are partitioned on
> ------------------------------------------------------------------------------------------
>                 Key: HBASE-3162
>                 URL: https://issues.apache.org/jira/browse/HBASE-3162
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>    Affects Versions: 0.90.0
>            Reporter: Jonathan Gray
>            Priority: Minor
> In many use cases of increments, a given counter is only incremented during a specific
window of time (ie. the counters are partitioned/sharded by time).
> With this kind of schema, you are constantly creating new counters.  When a new counter
is "created" (incremented the first time) you will always end up looking at a block from every
file in the region because no previous value will exist.  However, with the new TimeRange
optimizations that skip files if they don't contain values of the TimeRange you're interested
in, we could utilize that information to optimize the Get within the increment.
> This would be optional and an addition to the Increment class.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message