hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: HBase timestamp consistency aross multiple region servers?
Date Fri, 14 Mar 2014 21:22:23 GMT
This is kind of a Y answer to an X-Y question.

>
I want to use time stamp to order the updates by time. These updates
> are issued from multiple machines.

> I was thinking to use global counter (stored in a separated HBase
> table)
but I guess that counter table might become a hot spot since
> each update needs to update this table.

There are two possible answers to this question as posed.

1. You want HBase to order your updates by timestamp. This happens
naturally.

It is already strongly recommend that you run NTP on all of your HBase
servers as a matter of good distributed system hygiene.  If you don't
specify a specific timestamp in your mutations then HBase will use the
latest server time when persisting your values, and you will have updates
ordered by time.


2. You want to retrieve updates by timestamp. In other words, you don't
merely want HBase to order updates by time you also want to have a time
component as row key or part of a composite row key.

There are several schema design solutions to this. You can use Apache
Phoenix with salted keys. You can use Sematext's HBaseWD library. You can
use a separate distributed process for time ordered keys (strictly
speaking, k-ordered) such as Twitter's Snowflake. Choose one that looks
like it would work best for your use case.



On Fri, Mar 14, 2014 at 2:01 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> bq. if cell A is updated after cell B, is it guaranteed that the time stamp
> of cell A is always bigger than the time stamp of cell B
>
> As you mentioned, machines might be out of sync on time, the above may not
> always be true.
>
>
> On Fri, Mar 14, 2014 at 1:45 PM, S. Zhou <myxjtu@yahoo.com> wrote:
>
> > Here is what I am trying to figure out: in the same table,  if cell A is
> > updated after cell B, is it guaranteed that the time stamp of cell A is
> > always bigger than the time stamp of cell B, even cell A and cell B could
> > be stored on different machines (therefore these two machines might out
> of
> > sync on time)?
> >
> > The reason I am asking this question is
> : I want to use time stamp to order
> > the updates by time. These updates are issued from multiple machines. I
> was
> > thinking to use global counter (stored in a separated HBase table)
> but I
> > guess that counter table might become a hot spot since each update needs
> to
> > update this table.
> >
> > My general problem is: I want to sort the updates stored in Hbase from
> > multiple machines. Please let me know if you have good thoughts.
> >
> > Thanks a lot
> > Senqiang
> >
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message