hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: Timestamp resolution
Date Thu, 12 Jun 2014 05:39:56 GMT
The issues you cite are all orthogonal. We have client/RS time now, we have clock skew now,
that is completely independent from the time resolution.

I explained the need I saw for this before. Lemme include:

On Fri, May 23, 2014 at 06:16PM, lars hofhansl wrote:
> The specific discussion here was a transaction engine doing snapshot
> isolation using the HBase timestamps, but still be close to wall clock time
> as much as possible.
> In that scenario, with ms resolution you can only do 1000 transactions/sec,
> and so you need to turn the timestamp into something that is not wall clock
> time as HBase understands it (and hence TTL, etc, will no longer work, as
> well as any other tools you've written that use the HBase timestamp).
> 1m transactions/sec are good enough (for now, I envision in a few years
> we'll be sitting here wondering how we could ever think that 1m
> transaction/sec are sufficient) :)

The point is: Even if you had timestamp oracle (that can resolve ms and fill inside ms resolution
with a counter), there'd be no way to use this as the HBase timestamp while being close to
wall clock (so that TTL, etc, still works).
So specifically I was not advocating an automatic higher time resolution (as far as I know
that cannot be done reliably in Java across
multiple cores). I was advocating allowing clients with access to a (perhaps, but not necessarily
single threaded) timestamp oracle to store those timestamps and still make use of all HBase
optimization (filtering HFiles, TTL, etc).

-- Lars

 From: Michael Segel <michael_segel@hotmail.com>
To: dev@hbase.apache.org 
Cc: lars hofhansl <larsh@apache.org> 
Sent: Wednesday, June 11, 2014 2:03 PM
Subject: Re: Timestamp resolution

Weirdly enough I find that I have to agree with Andrew. 

First, how do you get time in units smaller than a ms? 
Second clock skew becomes an issue. 
Third, which clock are you using? The client machine? The RS? And then how do you synchronize
each of the RS to be within a ms of each other? 
Correct me if I’m wrong but NTP doesn’t give that close of a sync.  

Sorry, but really, not a good idea. 

If you want this… you can store the temporal data as a column. 

Time really is relative. 

On May 25, 2014, at 12:53 AM, Stack <stack@duboce.net> wrote:

> On Fri, May 23, 2014 at 5:27 PM, lars hofhansl <larsh@apache.org> wrote:
>> We have discussed this in the past. It just came up again during an
>> internal discussion.
>> Currently we simply store a Java timestamp (millisec since epoch), i.e. we
>> have ms resolution.
>> We do have 8 bytes for the TS, though. Not enough to store nanosecs (that
>> would only cover 2^63/10^9/3600/24/365.24 = 292.279 years), but enough for
>> microseconds (292279 years).
>> Should we just store he TS is microseconds? We could do that right now
>> (and just keep the ms resolution for now - i.e. the us part would always be
>> 0 for now).
>> Existing data must be in ms of course, so we'd grandfather that in, but
>> new tables could store by default in us.
>> We'd need to make this configurable both the column family level and
>> client level, so clients could still opt to see data in ms.
>> Comments? Too much to bite off?
>> -- Lars
> I'm a fan.  As Enis cites, HBASE-8927 has good discussion.  No
> configuration I'd say.  Just move to the new regime (though I suppose we
> should let you turn it off).
> I think it was Liu Shaohui (IIRC) who made a suggestion that had us put
> together ms and nanos under a synchronized block stamping the ts on Cells
> (left-shift the currentTimeMillis and fill in the bottom bytes with as much
> of the nanos as fits; i.e. your micros).  Rather than nanos/micros, we
> could use a counter instead if a Cell arrives in the same ms.  Would be
> costly having all ops go via one code block to get 'time' across cores and
> handlers.
> St.Ack
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message