Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 507659731 for ; Fri, 2 Mar 2012 22:00:24 +0000 (UTC) Received: (qmail 79460 invoked by uid 500); 2 Mar 2012 22:00:22 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 79416 invoked by uid 500); 2 Mar 2012 22:00:22 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 79406 invoked by uid 99); 2 Mar 2012 22:00:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Mar 2012 22:00:22 +0000 X-ASF-Spam-Status: No, hits=-0.5 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of opus111@gmail.com designates 209.85.212.41 as permitted sender) Received: from [209.85.212.41] (HELO mail-vw0-f41.google.com) (209.85.212.41) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Mar 2012 22:00:14 +0000 Received: by vbbey12 with SMTP id ey12so2510973vbb.14 for ; Fri, 02 Mar 2012 13:59:54 -0800 (PST) Received-SPF: pass (google.com: domain of opus111@gmail.com designates 10.52.178.35 as permitted sender) client-ip=10.52.178.35; Authentication-Results: mr.google.com; spf=pass (google.com: domain of opus111@gmail.com designates 10.52.178.35 as permitted sender) smtp.mail=opus111@gmail.com; dkim=pass header.i=opus111@gmail.com Received: from mr.google.com ([10.52.178.35]) by 10.52.178.35 with SMTP id cv3mr19867212vdc.44.1330725594077 (num_hops = 1); Fri, 02 Mar 2012 13:59:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=7RIV+wkLMGr03vug3V7OwTeVLG3AdckPdaELP1msj9c=; b=Tq/j16ZCbrHe7IiCggqVvqf5AR8Xcz+gO777w4G/7BOniZPYO4OX02067+07Ie7ha7 6MQbLJKTf3NVHTgv8Dsy1pAC1Z6NtQ3X5SPmL2XAsx5X7IHc2GCZTvPwwRb/5yFIM4JT uda0dI1zl6rQsOF3li/KbkPRxVIHtum6MVFrl6kaK1Etv5uD3VPEjf+/eGNeFksJbCjc ZiQY8boEjoEkC/6c+MtxAy+tkBjwoEY/6XXAOsx20fp1VAiWQJMU5qUi5faK8WN7hgeF tljBDfL0m7QeO+hNFNzCG588drysmXT2AWbYlWmcF31Ky06Dh49plPD83km3FxWgLAU2 7AyQ== Received: by 10.52.178.35 with SMTP id cv3mr16958413vdc.44.1330725594027; Fri, 02 Mar 2012 13:59:54 -0800 (PST) Received: from Hemiola.local (static-71-174-62-39.bstnma.fios.verizon.net. [71.174.62.39]) by mx.google.com with ESMTPS id fd10sm10794719vdc.1.2012.03.02.13.59.53 (version=SSLv3 cipher=OTHER); Fri, 02 Mar 2012 13:59:53 -0800 (PST) Message-ID: <4F5142D8.5080005@gmail.com> Date: Fri, 02 Mar 2012 16:59:52 -0500 From: Peter Wolf User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: user@hbase.apache.org Subject: Re: Scanning the last N rows References: <4F513ED3.5030907@gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Sorry, my code was a little off. It should have been Scan scan = new Scan(calculateStartRowKey(targetAccount), calculateEndRowKey(targetAccount)); Where my row key is formed from So, the scanner would match all the rows for this account, and return them most recent first. Iterator it = scanner.iterator(); But if I stop doing this... Result result = it.next(); Will that be efficient? Will the scanner potentially matching all rows for the account be a problem? P On 3/2/12 4:49 PM, Ian Varley wrote: > Yes, you do have to worry about efficiency. If your rows aren't ordered in the table (by rowkey) according to the update date, the server will be having to scan the entire table. Your filter will enable it to not send all of those results to the client, but it's still having to read them from disk and merge them with the rows in memory. It will likely not even be possible for a big table (and, if it's not a *big* table, it probably shouldn't be in HBase). > > The fundamental thing to note here is that there's no "magic": HBase stores records sorted in exactly one order; if what you want isn't able to be efficiently found according to that ordering, then you'll be scanning the whole table. Relational DBs do that too, but they also have indexes that let you get at things quickly in some other sort order. > > Ian > > On Mar 2, 2012, at 3:42 PM, Peter Wolf wrote: > > > Ah ha! So the row key orders the results, I just do an unbounded Scan, > and stop after N iterations. > > Like this... > > Scan scan = new Scan(); > Filter filter = new SingleColumnValueFilter(...); > scan.setFilter(filter); > ResultScanner scanner = hTable.getScanner(scan); > Iterator it = scanner.iterator(); > for ( int i=0; i<1000&& it.hasNext(); i++) { > Result result = it.next(); > ... do stuff with result... > } > > Do I have to worry about efficiency? Is the Server madly retrieving > rows, in the background, that the Client will never use? > > Thanks > P > > > > On 3/2/12 4:31 PM, Doug Meil wrote: > Hi there- > > Take a look at this section of the book... > > http://hbase.apache.org/book.html#reverse.timestamp > > > > > On 3/2/12 4:02 PM, "Peter Wolf"> wrote: > > Hello all, > > I want to retrieve the most recent N rows from a table, with some column > qualifiers. > > I can't find a Filter, or anything obvious in my books, or via Google. > > What is the idiom for doing this? > > Thanks > Peter > > > > >