Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 92534 invoked from network); 3 Aug 2010 16:37:34 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 3 Aug 2010 16:37:34 -0000 Received: (qmail 90737 invoked by uid 500); 3 Aug 2010 16:37:33 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 90694 invoked by uid 500); 3 Aug 2010 16:37:32 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 90686 invoked by uid 99); 3 Aug 2010 16:37:32 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Aug 2010 16:37:32 +0000 X-ASF-Spam-Status: No, hits=0.9 required=10.0 tests=RCVD_NUMERIC_HELO,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of gcjhhu-hbase-user@m.gmane.org designates 80.91.229.12 as permitted sender) Received: from [80.91.229.12] (HELO lo.gmane.org) (80.91.229.12) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Aug 2010 16:37:24 +0000 Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1OgKU7-0002lp-F1 for user@hbase.apache.org; Tue, 03 Aug 2010 18:37:02 +0200 Received: from 207.200.236.130 ([207.200.236.130]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 03 Aug 2010 18:36:59 +0200 Received: from luke.forehand by 207.200.236.130 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 03 Aug 2010 18:36:59 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: user@hbase.apache.org From: Luke Forehand Subject: Re: Secondary Index versus Full Table Scan Date: Tue, 3 Aug 2010 16:36:50 +0000 (UTC) Lines: 28 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: sea.gmane.org User-Agent: Loom/3.14 (http://gmane.org/) X-Loom-IP: 207.200.236.130 (Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8) Edward Capriolo writes: > Generally speaking: If you are doing full range scans of a table > indexes will not help. Adding indexes will make the performance worse, > it will take longer to load your data and now fetching the data will > involve two lookups instead of one. > > If you are doing full range scans adding more nodes should result in > linear scale up. > > Edward, Can you clarify what "full range scan" means? I am not doing "full" range scans, but I am doing relatively large range scans (3 million records), so I think what you are saying applies. Thanks for the insight. We initially implemented the secondary index out of a need to have our main data sorted by multiple dimensions for various use cases. Now I'm thinking it may be better to have multiple copies of our main data, sorted in multiple ways, to avoid the two lookups. So I'm faced with two options right now; multiple copies of the data sorted in multiple ways to do range scans, or buy a lot more servers and do full scans. Given these two choices, do people have general recommendations on which makes the most sense? Thanks! -Luke