hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Efficient time based queries - TIMERANGE or STARTROW/STOPROW?
Date Wed, 12 Apr 2017 19:05:14 GMT
If you change the rowkey, it would be tricky to satisfy what you stated in
your first email:

bq. retrieve the most recent 10 rows

Since the same uuid may have many rows which are no longer sorted in the
order amenable to your query.

Cheers

On Wed, Apr 12, 2017 at 10:43 AM, Josh <jofo90@gmail.com> wrote:

> Hi Ted,
>
> Thanks for the fast reply!
> Ok I see - just out of interest, if I changed my row key to be
> uuid#timestamp  (instead of uuid#reverse_timestamp) - would the timestamp
> approach still be equally efficient? I just want to understand whether or
> not the timestamp approach is relying on the ordering of my row keys.
>
> Josh
>
> On Wed, Apr 12, 2017 at 6:39 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > Since STARTROW is specified (with uuid) in both of your examples, I think
> > their efficiency should be tantamount.
> >
> > Cheers
> >
> > On Wed, Apr 12, 2017 at 10:33 AM, Josh <jofo90@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I am just getting started with HBase, and have a question about the
> > > efficiency of timestamp based scans.
> > >
> > > My table's row key has structure `uuid#reverse_timestamp` where
> > > reverse_timestamp is (java.lang.Long.MAX_VALUE - time in millis when
> the
> > > row was written). For a given uuid I want to be able to retrieve the
> most
> > > recent 10 rows in the table where timestamp is greater than x. It's
> > > possible that a given uuid may have many thousands of rows (with
> > different
> > > timestamps).
> > >
> > > I found there are two ways to run my query:
> > > 1. use HBase's built in timestamps and scan a time range:
> > > > scan 'mytable', {STARTROW => '647b2194-fbb8-46af-95ba-f498ddc8adcc',
> > > TIMERANGE => [x, current_time], LIMIT => 10}
> > >
> > > 2. use only my row keys to do the scan, with STARTROW and STOPROW:
> > > scan 'mytable', {STARTROW => '647b2194-fbb8-46af-95ba-f498ddc8adcc',
> > > STOPROW='647b2194-fbb8-46af-95ba-f498ddc8adcc#x', LIMIT => 10}
> > >
> > > Both of these seem to work - but is one more efficient that the other?
> > >
> > > Thanks for any advice,
> > > Josh
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message