From user-return-25975-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Mon May 7 23:15:55 2012 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9FC7FC38B for ; Mon, 7 May 2012 23:15:55 +0000 (UTC) Received: (qmail 33097 invoked by uid 500); 7 May 2012 23:15:53 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 33072 invoked by uid 500); 7 May 2012 23:15:53 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 33063 invoked by uid 99); 7 May 2012 23:15:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 May 2012 23:15:53 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of potekhin@bnl.gov designates 130.199.3.132 as permitted sender) Received: from [130.199.3.132] (HELO smtpgw.bnl.gov) (130.199.3.132) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 May 2012 23:15:45 +0000 X-BNL-policy-q: X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AsoFAONWqE+CxzYH/2dsb2JhbABEgx6wB4EHggwBAQVrHgsYCR4HDwI1ERMGAgEBF4dzC7p+in+GCQSbdIpMgwU X-IronPort-AV: E=Sophos;i="4.75,546,1330923600"; d="scan'208,217";a="184669963" Received: from rcf.rhic.bnl.gov ([130.199.54.7]) by smtpgw.sec.bnl.local with ESMTP/TLS/DHE-RSA-AES256-SHA; 07 May 2012 19:15:24 -0400 Received: from [192.168.0.196] (ool-44c3ab4d.dyn.optonline.net [68.195.171.77]) (authenticated bits=0) by rcf.rhic.bnl.gov (8.13.8/8.13.8) with ESMTP id q47NFN8R018168 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Mon, 7 May 2012 19:15:24 -0400 Message-ID: <4FA85785.60400@bnl.gov> Date: Mon, 07 May 2012 19:15:17 -0400 From: Maxim Potekhin User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: Re: Cassandra search performance References: <4F9DCFAB.7010406@bnl.gov> In-Reply-To: Content-Type: multipart/alternative; boundary="------------080407090204030907010604" This is a multi-part message in MIME format. --------------080407090204030907010604 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Thanks for the comments, much appreciated. Maxim On 5/7/2012 3:22 AM, David Jeske wrote: > On Sun, Apr 29, 2012 at 4:32 PM, Maxim Potekhin > wrote: > > Looking at your example,as I think you understand, you forgo > indexes by > combining two conditions in one query, thinking along the lines of > what is > often done in RDBMS. A scan is expected in this case, and there is no > magic to avoid it. > > > This sounds like a mis-understanding of how RDBMSs work. If you > combine two conditions in a single SQL query, the SQL execution > optimizer looks at the cardinality of any indicies. If it can > successfully predict that one of the conditions significantly reduces > the set of rows that would be considered (such as a status match > having 200 hits vs 1M rows in the table), then it selects this index > for the first-iteration, and each index hit causes a record lookup > which is then tested for the other conditions. (This is one of > several query-execution types RDBMS systems use) > > I'm no Cassandra expert, so I don't know what it does WRT > index-selection, but from the page written on secondary indicies, it > seems like if you just query on status, and do the other filtering > yourself it'll probably do what you want... > > http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes > > However, if this query is important, you can easily index on two > conditions, > using a composite type (look it up), or string concatenation for > quick and > easy solution. > > > This is not necessarily a good idea. Creating a composite index > explodes the index size unnecessarily. If a condition can reduce a > query to 200 records, there is no need to have a composite index > including another condition. --------------080407090204030907010604 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Thanks for the comments, much appreciated.

Maxim


On 5/7/2012 3:22 AM, David Jeske wrote:
On Sun, Apr 29, 2012 at 4:32 PM, Maxim Potekhin <potekhin@bnl.gov> wrote:
Looking at your example,as I think you understand, you forgo indexes by
combining two conditions in one query, thinking along the lines of what is
often done in RDBMS. A scan is expected in this case, and there is no
magic to avoid it.

This sounds like a mis-understanding of how RDBMSs work. If you combine two conditions in a single SQL query, the SQL execution optimizer looks at the cardinality of any indicies. If it can successfully predict that one of the conditions significantly reduces the set of rows that would be considered (such as a status match having 200 hits vs 1M rows in the table), then it selects this index for the first-iteration, and each index hit causes a record lookup which is then tested for the other conditions.  (This is one of several query-execution types RDBMS systems use)

I'm no Cassandra expert, so I don't know what it does WRT index-selection, but from the page written on secondary indicies, it seems like if you just query on status, and do the other filtering yourself it'll probably do what you want...

 
However, if this query is important, you can easily index on two conditions,
using a composite type (look it up), or string concatenation for quick and
easy solution.

This is not necessarily a good idea. Creating a composite index explodes the index size unnecessarily. If a condition can reduce a query to 200 records, there is no need to have a composite index including another condition.

--------------080407090204030907010604--