Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CB6FEDA23 for ; Tue, 24 Jul 2012 14:29:56 +0000 (UTC) Received: (qmail 92837 invoked by uid 500); 24 Jul 2012 14:29:55 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 92313 invoked by uid 500); 24 Jul 2012 14:29:54 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 92299 invoked by uid 99); 24 Jul 2012 14:29:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Jul 2012 14:29:54 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=FSL_RCVD_USER,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.213.169] (HELO mail-yx0-f169.google.com) (209.85.213.169) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Jul 2012 14:29:48 +0000 Received: by yenr5 with SMTP id r5so8185399yen.14 for ; Tue, 24 Jul 2012 07:29:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=EPnm85ZotbzPI0lZ+DlYX4xDovuQrNyPrxzqY/QAAsM=; b=Qhcg7xQkQ6CYNAwI1R0eLKeo9k0AMPJUlS/aFPOSRAC15AOPcLG0wfnQGlPtUR1IP9 WMO94X6OwXOJwXn+uBwqkTN5y3ZC3VKl8isIbR21lTUfn1JjOqNnLBiWYcL3AuOMvGyN W+pKeCZqKHO6zbaIqRlfGyQICujztsYIKzHi3HTdXpWBp1Nf8Yim5VYiLAeRHwRyxyw0 LoWp862LbU0HR8NXKiuvXGC1bybmL6qLuhqF02bbs68iDLUhixbAIBZwJpaepdz2pjVV gJbqwYckVOguCzM11ZVFP9SWNxWo2Ufv7jsbF3jES8WHjIIB2r4qb7LIZztIgoQh+Z0z 8fAA== MIME-Version: 1.0 Received: by 10.42.145.7 with SMTP id d7mr15733034icv.45.1343140167256; Tue, 24 Jul 2012 07:29:27 -0700 (PDT) Received: by 10.64.33.203 with HTTP; Tue, 24 Jul 2012 07:29:27 -0700 (PDT) In-Reply-To: References: Date: Tue, 24 Jul 2012 10:29:27 -0400 Message-ID: Subject: Re: Schema for sorted results From: Jean-Marc Spaggiari To: user@hbase.apache.org Content-Type: text/plain; charset=UTF-8 X-Gm-Message-State: ALoCoQmI/AUK+9zBmV3ZpxHvE36mmyOEqrVShbK026XFVMCVVNWQ80aJKrYIZTnddeuILM5esYUM Hi Hari, Why do you think it's wasteful? Let's imagine this situation. Key=||| Value = nothing. And this one: Key= Value = || Both situation will, at the end, represent almost the same size in the database. You can also do somthing like that: Key= ColumnFamillyName= Value=| Just that the first option will allow you to retreive the information you are looking for very quickly. Now, are you sure that this key is really what you need? What will be the access model for your database? With the key you are using, you will have to search by date first. So if you want to fine all the entries for one URL, you will have to scan the entire table, jumping to the next date each time you find it. If you are searching by date, then this key is good. So you really need first to think on the way you are going to read your data, and then, you will be able to design a key to match your needs. JM 2012/7/24, Minh Duc Nguyen : > Hari, > > According to the HBase book: http://hbase.apache.org/book.html#dm.sort > > All data model operations HBase return data in sorted order. First by row, > then by ColumnFamily, followed by column qualifier, and finally timestamp > (sorted in reverse, so newest records are returned first). > > ~ Minh > > On Tue, Jul 24, 2012 at 9:50 AM, Hari Prasanna wrote: > >> Hello - >> >> I'm using HBase for web server log processing and I'm trying to save >> the top N urls per category per day in a sorted manner in HBase. From >> what I've read, the only sortable structure that HBase offers is the >> lexicographic sort in the row keys. So, here is the rowkey format I'm >> currently using >> ||| >> where, padded_visits = Long.MAX_VALUE - visits >> >> This seems wasteful because of the long rowkeys. Is there any other >> approach to maintain sorted results in HBase? >> >> Thanks >> Hari Prasanna >> >