Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 358D9CEBF for ; Wed, 3 Jul 2013 08:06:26 +0000 (UTC) Received: (qmail 92134 invoked by uid 500); 3 Jul 2013 08:06:23 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 91974 invoked by uid 500); 3 Jul 2013 08:06:23 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 91966 invoked by uid 99); 3 Jul 2013 08:06:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jul 2013 08:06:23 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,T_FRT_COCK X-Spam-Check-By: apache.org Received-SPF: error (athena.apache.org: local policy) Received: from [209.85.212.53] (HELO mail-vb0-f53.google.com) (209.85.212.53) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jul 2013 08:06:18 +0000 Received: by mail-vb0-f53.google.com with SMTP id p12so5282546vbe.26 for ; Wed, 03 Jul 2013 01:05:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-originating-ip:in-reply-to:references:from:date :message-id:subject:to:content-type:x-gm-message-state; bh=Eu2YJaLkoZyVz4UmVG1UEBu44Qcq/NOZH8h8QOL3ZRk=; b=IXwaNvYkBgTBIAWNj0pPJ1dwPO5Rd2JbYdx0iGRpxQvk7ympUNpG9JX0fcUsPSD6RF DhWSICKlFeAgAECbmFZ4FlkoB6Q8/uG74dFnMfVg8rUKKJHC/1W9pzYsfLk+Lemw9fY1 pecek6BoLZtZMbzCfOkRLSYLUcQ0Fz5oeScgRG1jFgcNcaMNYCP1wHz/bzLpIWzpIbb1 5lkQTaoisr3lUFxIO26wHoXCB6P5KvKdggbEuJL1bvFZ9E+GGgCkAXZ2jKR+uX9LCawg U4OgHPpNPGP5ujAvd3kURJAIUCkvc1D56oxdptPKkkjGUPo2IsMfi4cSC67mfjBKtTv/ as2Q== X-Received: by 10.52.66.49 with SMTP id c17mr4140913vdt.94.1372838736551; Wed, 03 Jul 2013 01:05:36 -0700 (PDT) MIME-Version: 1.0 Received: by 10.58.161.101 with HTTP; Wed, 3 Jul 2013 01:05:16 -0700 (PDT) X-Originating-IP: [193.205.206.25] In-Reply-To: References: From: Flavio Pompermaier Date: Wed, 3 Jul 2013 10:05:16 +0200 Message-ID: Subject: Re: Help in designing row key To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=20cf3071d00a39f5fa04e096ed95 X-Gm-Message-State: ALoCoQmXYjB4n7KXHHuHaQkv+DX3FcSOBBtuucEA4y4HhWmFxFzJL4/Idm2aXqmMabdj7e8zHn9l X-Virus-Checked: Checked by ClamAV on apache.org --20cf3071d00a39f5fa04e096ed95 Content-Type: text/plain; charset=ISO-8859-1 Thank you very much for the great support! This is how I thought to design my key: PATTERN: source|type|qualifier|hash(name)|timestamp EXAMPLE: google|appliance|oven|be9173589a7471a7179e928adc1a86f7|1372837702753 Do you think my key could be good for my scope (my search will be essentially by source or source|type)? Another point is that initially I will not have so many sources, so I will probably have only google|* but in the next phases there could be more sources.. Best, Flavio On Tue, Jul 2, 2013 at 7:53 PM, Ted Yu wrote: > For #1, yes - the client receives less data after filtering. > > For #2, please take a look at TestMultiVersions > (./src/test/java/org/apache/hadoop/hbase/TestMultiVersions.java in 0.94) > for time range: > > scan = new Scan(); > > scan.setTimeRange(1000L, Long.MAX_VALUE); > For row key selection, you need a filter. Take a look at > FuzzyRowFilter.java > > Cheers > > On Tue, Jul 2, 2013 at 10:35 AM, Flavio Pompermaier >wrote: > > > Thanks for the reply! I thus have two questions more: > > > > 1) is it true that filtering on timestamps doesn't affect performance..? > > 2) could you send me a little snippet of how you would do such a filter > (by > > row key + timestamps)? For example get all rows whose key starts with > > 'someid-' and whose timestamps is greater than some timestamp? > > > > Best, > > Flavio > > > > > > On Tue, Jul 2, 2013 at 6:25 PM, Ted Yu wrote: > > > > > bq. Using timestamp in row-keys is discouraged > > > > > > The above is true. > > > Prefixing row key with timestamp would create hot region. > > > > > > bq. should I filter by a simpler row-key plus a filter on timestamp? > > > > > > You can do the above. > > > > > > On Tue, Jul 2, 2013 at 9:13 AM, Flavio Pompermaier < > pompermaier@okkam.it > > > >wrote: > > > > > > > Hi to everybody, > > > > > > > > in my use case I have to perform batch analysis skipping old data. > > > > For example, I want to process all rows created after a certain > > > timestamp, > > > > passed as parameter. > > > > > > > > What is the most effective way to do this? > > > > Should I design my row-key to embed timestamp? > > > > Or just filtering by timestamp of the row is fast as well? Or what > > else? > > > > > > > > Initially I was thinking to compose my key as: > > > > timestamp|source|title|type > > > > > > > > but: > > > > > > > > 1) Using timestamp in row-keys is discouraged > > > > 2) If this design is ok, using this approach I still have problems > > > > filtering by timestamp because I cannot found a way to numerically > > filer > > > > (instead of alphanumerically/by string). Example: > > > > 1372776400441|something has timestamp lesser > > > > than 1372778470913|somethingelse but I cannot filter all row whose > key > > is > > > > "numerically" greater than 1372776400441. Is it possible to overcome > > this > > > > issue? > > > > 3) If this design is not ok, should I filter by a simpler row-key > plus > > a > > > > filter on timestamp? Or what else? > > > > > > > > Best, > > > > Flavio > > > > > > > > > > --20cf3071d00a39f5fa04e096ed95--