Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 93194200C5B for ; Thu, 13 Apr 2017 05:36:31 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 91B6D160BA8; Thu, 13 Apr 2017 03:36:31 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id D7E8E160B95 for ; Thu, 13 Apr 2017 05:36:30 +0200 (CEST) Received: (qmail 80279 invoked by uid 500); 13 Apr 2017 03:36:28 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 80267 invoked by uid 99); 13 Apr 2017 03:36:28 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Apr 2017 03:36:28 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id B02861A0139 for ; Thu, 13 Apr 2017 03:36:27 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.121 X-Spam-Level: X-Spam-Status: No, score=-0.121 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id gG8mFWdJc7Dz for ; Thu, 13 Apr 2017 03:36:24 +0000 (UTC) Received: from mail-vk0-f54.google.com (mail-vk0-f54.google.com [209.85.213.54]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id E37E35FAE1 for ; Thu, 13 Apr 2017 03:36:23 +0000 (UTC) Received: by mail-vk0-f54.google.com with SMTP id j127so16402735vkh.0 for ; Wed, 12 Apr 2017 20:36:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=CW+rG22sOlOid5FTPcrhMtn3tMxtAdnoEtdhlk2sKEA=; b=d7vUXFdIy20RosqCIjfzAd1japrsiuO0YRM0gDWVJgN/4t8KqD7AibkWRX4j7mUBOF L60NIAd/D7PMmwli7CmvXo1OgWaNqdDvjq6V1gNXGxZKs4lnoGd06L85OfdtVOMJLny2 /ucMu1ZJXI9qbMG5oKtPknVehDeWl1iYVqp7SaVzBjJAIGLX6Nc22+B0FEJxNw0J/aMy w9pkdIWj/bGNg5VHeLZhXhab+MJXKjHSmBvl4vEsuL6jcl1hb+4UyV+GXNsi6gKrawoe OfIqiE/YmwkxJvCvb2dQPvNtDl+NlRgNDN4ug66kDuWMorDcsV4UsXkni+G1xtAwQKtE OGqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=CW+rG22sOlOid5FTPcrhMtn3tMxtAdnoEtdhlk2sKEA=; b=cymzkZPNDomtRp27nQzy95vn9fLpKLBJ6I1M5LKZCA3O7T1VAFiuXiWGPeQ7Suai0G AA2EZ2vDtR7OXuU8BV/9tdGuY6jK7z2oW28N/Tcu4ILBjtbrFKzSF1EqI8WiHVg7P4cF tOJm9kPbD/I3BAdqiy7OoC9zG9ODp3088JgRHGjfY4cORGq/XAfkARAzcUYPZzwqmIHT SZmBPCQBpvlAfVX801b8iF30KjOqy+evqx3reyMEGczKZlJxGMWFuq8UVWnRKVekOxSn lGukZ+P+tubidl4cseHRxF84XSccTno9eKJ/QZ4oAoYW3fMgg1nIyyDIvaH0k4GnnvU8 APBQ== X-Gm-Message-State: AN3rC/5yMy835ou1oeG9ql3LzzJ8Uwd8+ow4KRKmQRZxr0ddCqLgkYPm FQJkLY4ZftaiR2S3IGyl9JwcRUKZX64i X-Received: by 10.31.134.196 with SMTP id i187mr441829vkd.56.1492054577091; Wed, 12 Apr 2017 20:36:17 -0700 (PDT) MIME-Version: 1.0 Received: by 10.103.80.203 with HTTP; Wed, 12 Apr 2017 20:36:16 -0700 (PDT) In-Reply-To: References: From: Josh Elser Date: Wed, 12 Apr 2017 23:36:16 -0400 Message-ID: Subject: Re: Efficient time based queries - TIMERANGE or STARTROW/STOPROW? To: user@hbase.apache.org Content-Type: text/plain; charset=UTF-8 archived-at: Thu, 13 Apr 2017 03:36:31 -0000 If it's helpful to state it in generic terms: specifying a range of HBase timestamps is *only* a post-filter (server-side) and *never* a primary search criteria. In other words, searching by the HBase timestamp is a full-table scan (exhaustive search). While the timestamp can be nice for certain use-cases, I think it best to use it sparingly and only for its original purpose (preserving old versions of a cell), and not as a "query parameter". I've seen way too many use-cases where people think they're being tricky by using timestamps to implement a poor-man's search. This works for small amounts of data, but quickly falls to its knees with non-trivial data-sets. On Wed, Apr 12, 2017 at 3:05 PM, Ted Yu wrote: > If you change the rowkey, it would be tricky to satisfy what you stated in > your first email: > > bq. retrieve the most recent 10 rows > > Since the same uuid may have many rows which are no longer sorted in the > order amenable to your query. > > Cheers > > On Wed, Apr 12, 2017 at 10:43 AM, Josh wrote: > >> Hi Ted, >> >> Thanks for the fast reply! >> Ok I see - just out of interest, if I changed my row key to be >> uuid#timestamp (instead of uuid#reverse_timestamp) - would the timestamp >> approach still be equally efficient? I just want to understand whether or >> not the timestamp approach is relying on the ordering of my row keys. >> >> Josh >> >> On Wed, Apr 12, 2017 at 6:39 PM, Ted Yu wrote: >> >> > Since STARTROW is specified (with uuid) in both of your examples, I think >> > their efficiency should be tantamount. >> > >> > Cheers >> > >> > On Wed, Apr 12, 2017 at 10:33 AM, Josh wrote: >> > >> > > Hi, >> > > >> > > I am just getting started with HBase, and have a question about the >> > > efficiency of timestamp based scans. >> > > >> > > My table's row key has structure `uuid#reverse_timestamp` where >> > > reverse_timestamp is (java.lang.Long.MAX_VALUE - time in millis when >> the >> > > row was written). For a given uuid I want to be able to retrieve the >> most >> > > recent 10 rows in the table where timestamp is greater than x. It's >> > > possible that a given uuid may have many thousands of rows (with >> > different >> > > timestamps). >> > > >> > > I found there are two ways to run my query: >> > > 1. use HBase's built in timestamps and scan a time range: >> > > > scan 'mytable', {STARTROW => '647b2194-fbb8-46af-95ba-f498ddc8adcc', >> > > TIMERANGE => [x, current_time], LIMIT => 10} >> > > >> > > 2. use only my row keys to do the scan, with STARTROW and STOPROW: >> > > scan 'mytable', {STARTROW => '647b2194-fbb8-46af-95ba-f498ddc8adcc', >> > > STOPROW='647b2194-fbb8-46af-95ba-f498ddc8adcc#x', LIMIT => 10} >> > > >> > > Both of these seem to work - but is one more efficient that the other? >> > > >> > > Thanks for any advice, >> > > Josh >> > > >> > >>