hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jameson Lopp <jame...@bronto.com>
Subject Re: setTimeRange for HBase Increment
Date Thu, 29 Sep 2011 19:40:57 GMT
Thanks! Nevertheless, can anyone confirm / deny if the scenario I 
described would play out in that manner? Just want to make sure I 
understand the functionality.

--
Jameson Lopp
Software Engineer
Bronto Software, Inc

On 09/29/2011 03:32 PM, Doug Meil wrote:
>
> Here are a few links on table cleanup and major compactions...
>
> http://hbase.apache.org/book.html#schema.minversions   (ttl related)
>
> http://hbase.apache.org/book.html#perf.deleting.queue
>
> http://hbase.apache.org/book.html#compaction
>
>
>
>
>
> On 9/29/11 2:29 PM, "Ted Yu"<yuzhihong@gmail.com>  wrote:
>
>> Doug Meil may point you to related doc.
>>
>> Take a look at this as well:
>> https://issues.apache.org/jira/browse/HBASE-4241
>>
>> On Thu, Sep 29, 2011 at 11:22 AM, Jameson Lopp<jameson@bronto.com>  wrote:
>>
>>> Hm, well I didn't mention a number of other requirements for the feature
>>> I'm building, but long story short, I need to keep track of millions to
>>> billions of these counters and need the lookup time to be as close to
>>> constant time as possible, thus I was really hoping to avoid doing table
>>> scans.
>>>
>>> I'll admit I know nothing of the dangers of auto-pruning; is there an
>>> article / documentation I could read about it? Google wasn't very
>>> helpful.
>>>
>>>
>>> --
>>> Jameson Lopp
>>> Software Engineer
>>> Bronto Software, Inc
>>>
>>>
>>> On 09/29/2011 02:12 PM, Jean-Daniel Cryans wrote:
>>>
>>>> My advice usually regarding timestamps is if it's part of your data
>>>> model, it should appear somewhere in an HBase key. 99% of the time
>>>> overloading the HBase timestamps is a bad idea, especially with
>>>> counters since there's auto-pruning done in the Memstore!
>>>>
>>>> I would suggest you make time part of your row key, maybe one counter
>>>> per day, and then set the TTL on your table to 30 days. Then all you
>>>> need to do is a sequential scan for those 30 days maybe with a prefix
>>>> that refers to some event id.
>>>>
>>>> OpenTSDB is another way of doing it: http://opentsdb.net/
>>>>
>>>> J-D
>>>>
>>>> On Thu, Sep 29, 2011 at 11:04 AM, Jameson Lopp<jameson@bronto.com>
>>>>   wrote:
>>>>
>>>>> I wish to store a count of 30-day trailing event data (e.g. # of
>>>>> clicks
>>>>> in
>>>>> past 30 days) and ended up reading the documentation for setTimeRange
>>>>> in
>>>>> the
>>>>> Increment operation.
>>>>> http://hbase.apache.org/**apidocs/org/apache/hadoop/**
>>>>>
>>>>> hbase/client/Increment.html#**getTimeRange%28%29<http://hbase.apache.or
>>>>> g/apidocs/org/apache/hadoop/hbase/client/Increment.html#getTimeRange%28
>>>>> %29>
>>>>>
>>>>> I was hoping someone could clarify if it works as I'm imagining in
>>>>> this
>>>>> example scenario.
>>>>>
>>>>> 1) Current click count is 0
>>>>>
>>>>> 2) I process a click and I perform an increment operation with the
>>>>> time
>>>>> range set to minStamp = now and maxStamp = 30 days from now
>>>>>
>>>>> 3) I query for the value immediately and find it to be 1
>>>>>
>>>>> 4) Assuming no other clicks come in, if I query for the value in 31
>>>>> days,
>>>>> it
>>>>> will be returned as 0
>>>>>
>>>>> In essence, I'm looking for a way to set a TTL on my increment
>>>>> operation.
>>>>> Is
>>>>> this how it actually works? The documentation is a bit vague and I
>>>>> could
>>>>> imagine several other scenarios.
>>>>> --
>>>>> Jameson Lopp
>>>>> Software Engineer
>>>>> Bronto Software, Inc
>>>>>
>>>>>
>

Mime
View raw message