Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 9114 invoked from network); 3 Feb 2011 21:36:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Feb 2011 21:36:17 -0000 Received: (qmail 59772 invoked by uid 500); 3 Feb 2011 21:36:15 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 59728 invoked by uid 500); 3 Feb 2011 21:36:15 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 59715 invoked by uid 99); 3 Feb 2011 21:36:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Feb 2011 21:36:15 +0000 X-ASF-Spam-Status: No, hits=3.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mailinglists19@gmail.com designates 209.85.216.41 as permitted sender) Received: from [209.85.216.41] (HELO mail-qw0-f41.google.com) (209.85.216.41) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Feb 2011 21:36:10 +0000 Received: by qwa26 with SMTP id 26so1297645qwa.14 for ; Thu, 03 Feb 2011 13:35:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=7bAEwq1zJydmQoWjx07UEiBIcfn1NWVuANWy+MYD/LE=; b=X5O9oXCZGNMWFWSCsuswRyeZASOTg8AikksGmauyVwsal3UjJUy+gjeROJR23jHIXd SjOdxs6Bh1IZ0mQcIYaZap3pV3l+xEw+NuiS/Lda1FFOn7E76/TOKBNheYpATE9XYDvK avLJur+4hKBMyx4mwDRxYZXYIG7GUTkyJCVX8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=B7pTjkS7ekIFZfXpUl59GO1GaxA9DDcN9BZnaevir6CMjszE+Rd1zuUog6SgVS1PUZ eipvBTiPHiQ+NaG1Iyqax1IoN79/UmWC7zx5Os4Wapbvyd8gzW9vUwS64uRmty4pLe/a NShCZgWwhLGObZ/qXWM5ncIgmwwH7w0fClnGQ= MIME-Version: 1.0 Received: by 10.229.231.9 with SMTP id jo9mr9749805qcb.201.1296768948666; Thu, 03 Feb 2011 13:35:48 -0800 (PST) Received: by 10.229.213.82 with HTTP; Thu, 3 Feb 2011 13:35:48 -0800 (PST) In-Reply-To: References: Date: Thu, 3 Feb 2011 13:35:48 -0800 Message-ID: Subject: Re: Fastest way to read only the keys of a HTable? From: Something Something To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=00163630f3238a81c2049b678cd7 --00163630f3238a81c2049b678cd7 Content-Type: text/plain; charset=ISO-8859-1 After adding the following line: scan.addFamily(Bytes.toBytes("Info")); performance improved dramatically (Thank you both!). But now I want it to perform even faster, if possible -:) To read 43 rows, it's taking 2 seconds. Eventually, the 'partner' table may have over 500 entries. I guess, I will try by moving the recently added family to a different table. Do you think that might help? Thanks again. On Thu, Feb 3, 2011 at 12:15 PM, Jonathan Gray wrote: > If you only need to consider a single column family, use Scan.addFamily() > on your scanner. Then there will be no impact of the other column families. > > > -----Original Message----- > > From: Something Something [mailto:mailinglists19@gmail.com] > > Sent: Thursday, February 03, 2011 11:28 AM > > To: user@hbase.apache.org > > Subject: Re: Fastest way to read only the keys of a HTable? > > > > Hmm.. performance hasn't improved at all. Do you see anything wrong with > > the following code: > > > > > > public List getPartners() { > > ArrayList partners = new ArrayList(); > > > > try { > > HTable table = new HTable("partner"); > > Scan scan = new Scan(); > > scan.setFilter(new FirstKeyOnlyFilter()); > > ResultScanner scanner = table.getScanner(scan); > > Result result = scanner.next(); > > while (result != null) { > > Partner partner = new > > Partner(Bytes.toString(result.getRow())); > > partners.add(partner); > > result = scanner.next(); > > } > > } catch (IOException e) { > > throw new RuntimeException(e); > > } > > return partners; > > } > > > > May be I shouldn't use more than one "column family" in a HTable - but > the > > BigTable paper recommends that, doesn't it? Please advice and thanks for > > your help. > > > > > > > > > > On Wed, Feb 2, 2011 at 10:55 PM, Stack wrote: > > > > > I don't see a getKey on Result. Use > > > > > > > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result. > > > html#getRow() > > > . > > > > > > Here is how its used in the shell table.rb class: > > > > > > # Count rows in a table > > > def count(interval = 1000, caching_rows = 10) > > > # We can safely set scanner caching with the first key only filter > > > scan = org.apache.hadoop.hbase.client.Scan.new > > > scan.cache_blocks = false > > > scan.caching = caching_rows > > > > > > scan.setFilter(org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter.new) > > > > > > # Run the scanner > > > scanner = @table.getScanner(scan) > > > count = 0 > > > iter = scanner.iterator > > > > > > # Iterate results > > > while iter.hasNext > > > row = iter.next > > > count += 1 > > > next unless (block_given? && count % interval == 0) > > > # Allow command modules to visualize counting process > > > yield(count, String.from_java_bytes(row.getRow)) > > > end > > > > > > # Return the counter > > > return count > > > end > > > > > > > > > St.Ack > > > > > > On Thu, Feb 3, 2011 at 6:47 AM, Something Something > > > wrote: > > > > Thanks. So I will add this... > > > > > > > > scan.setFilter(new FirstKeyOnlyFilter()); > > > > > > > > But after I do this... > > > > > > > > Result result = scanner.next(); > > > > > > > > There's no... result.getKey() - so what method would give me the > > > > Key > > > value? > > > > > > > > > > > > > > > > On Wed, Feb 2, 2011 at 10:20 PM, Stack wrote: > > > > > > > >> See > > > >> > > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKe > > > yOnlyFilter.html > > > >> St.Ack > > > >> > > > >> On Thu, Feb 3, 2011 at 6:01 AM, Something Something > > > >> wrote: > > > >> > I want to read only the keys in a table. I tried this... > > > >> > > > > >> > try { > > > >> > > > > >> > HTable table = new HTable("myTable"); > > > >> > > > > >> > Scan scan = new Scan(); > > > >> > > > > >> > scan.addFamily(Bytes.toBytes("Info")); > > > >> > > > > >> > ResultScanner scanner = table.getScanner(scan); > > > >> > > > > >> > Result result = scanner.next(); > > > >> > > > > >> > while (result != null) { > > > >> > > > > >> > & so on... > > > >> > > > > >> > This was performing fairly well until I added another Family that > > > >> contains > > > >> > lots of key/value pairs. My understanding was that adding > > > >> > another > > > family > > > >> > wouldn't affect performance of this code because I am explicitly > > > >> > using "Info", but it is. > > > >> > > > > >> > Anyway, in this particular use case, I only care about the "Key" > > > >> > of > > > the > > > >> row. > > > >> > I don't need any values from any of the families. What's the > > > >> > best > > > way > > > >> to > > > >> > do this? > > > >> > > > > >> > Please let me know. Thanks. > > > >> > > > > >> > > > > > > > > --00163630f3238a81c2049b678cd7--