Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1E094C8B7 for ; Wed, 16 May 2012 11:59:03 +0000 (UTC) Received: (qmail 41386 invoked by uid 500); 16 May 2012 11:59:01 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 40973 invoked by uid 500); 16 May 2012 11:59:00 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 40947 invoked by uid 99); 16 May 2012 11:58:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 May 2012 11:58:59 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of feestend@gmail.com designates 209.85.214.44 as permitted sender) Received: from [209.85.214.44] (HELO mail-bk0-f44.google.com) (209.85.214.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 May 2012 11:58:51 +0000 Received: by bkty8 with SMTP id y8so621924bkt.31 for ; Wed, 16 May 2012 04:58:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=d+pOqvFA+aprpW5hsPGz+3J6Ct/7fSt+zPNIk7St/xY=; b=EHIn6pofkgrTGmvaMMd4dXMX14wxjdgkK7awPBsj//GeFQUG1zrDvd6UTCOwDMcMHT XLkJXXD+xPDwKB7x4yb92bn6eXhRYl9D+COa+UbNyxpEp2uwN61CXdf+T7qBlXyil6Oy /El2+3dx1irdvptodCeaLtmGeTXlkFk7azv1FKKPox0kxbW9nLTy8a2pZn3KQyHYNi8Z RgPMyeevpcWbDwLuxtt/3XGGKamA0g1v9jK5A9aKq+U+Ad8okRngiyicPnFnqo+qculq zS0ziyl588NEcshuDizCw68VN7/ahNm1By7Il3pwtskCT/WIQfKu5rFgeWGW9N76JSYV +e9g== MIME-Version: 1.0 Received: by 10.204.151.200 with SMTP id d8mr1085355bkw.82.1337169511205; Wed, 16 May 2012 04:58:31 -0700 (PDT) Received: by 10.205.132.202 with HTTP; Wed, 16 May 2012 04:58:31 -0700 (PDT) Date: Wed, 16 May 2012 13:58:31 +0200 Message-ID: Subject: understanding of native indexes: limitations, potential side effects,... From: David Vanderfeesten To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0015175cba84b84b5e04c0260955 --0015175cba84b84b5e04c0260955 Content-Type: text/plain; charset=ISO-8859-1 Hi I like to better understand the limitations of native indexes, potential side effects and scenarios where they are required. My understanding so far : - Is that indexes on each node are storing indexes for data locally on the node itself. - Indexes do not return values in a sorted way (hashes of the indexed row keys are defining the order) - Given by the design referred in the first bullet, a coordinator node receiving a read of a native index, needs to spawn a read to multiple nodes(set of nodes together covering at least the complete key space + potentially more to assure read consistency level). - Each write to an indexed column leads to an additional local read of the index to update the index (kind of obvious but easily forgotten when tuning your system for write-only workload) - When using a where clause in CQL you need at least to specify an equal condition on a native indexed column. Additional conditions in the where clause are filtered out by the coordinator node receiving the CQL query. - native indexes do not support very well columns with high number of discrete values throughout the entire CF. Is upper understanding correct and complete? Some doubts: - about the limitation of indexing columns with high number of discrete values: I assume native indexes are implemented with an internally managed CF per index. With high cardinality values, in worst case, the number of rows in the index are identical to the number of rows of the indexed CF. Or are there other reasons for the limitation, and if that's the case, is there a guideline on the max. nbr of cardinality that is still reasonable? -Are column updates and the update of the indexes (read + write action) atomic and isolated from concurrent updates? Txs! David --0015175cba84b84b5e04c0260955 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi

I like to better understand the limitations of native indexes, po= tential side effects and scenarios where they are required.

My understanding so far :
- Is that indexes on each node are storing indexes for data locally on the = node itself.
- Indexes do not return values in a sorted way (hashes of t= he indexed row keys are defining the order)
- Given by the design referr= ed in the first bullet, a coordinator node receiving a read of a native ind= ex, needs to spawn a read to multiple nodes(set of nodes together covering = at least the complete key space + potentially more to assure read consisten= cy level).
- Each write to an indexed column leads to an additional local read of the = index to update the index (kind of=20 obvious but easily forgotten when tuning your system for write-only=20 workload)

- When using a where clause in CQL you need at least to= specify an equal condition on a native indexed column. Additional conditio= ns in the where clause are filtered out by the coordinator node receiving t= he CQL query.
- native indexes do not support very well columns with high number of discr= ete values throughout the entire CF.

Is upper understanding correct = and complete?
Some doubts:
- about the limitation of indexing colum= ns with high number of discrete values:
I assume native indexes=A0 are implemented with an internally managed CF pe= r index. With high cardinality values, in worst case, the number of rows in= the index are identical to the number of rows of the indexed CF. Or are th= ere other reasons for the limitation, and if that's the case, is there a guideline on the max. nbr of cardinality that is still= reasonable?
-Are column updates and the update of the indexes (read + write action) ato= mic and isolated from concurrent updates?

Txs!
David




--0015175cba84b84b5e04c0260955--