Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9F1119267 for ; Wed, 16 May 2012 16:24:26 +0000 (UTC) Received: (qmail 81387 invoked by uid 500); 16 May 2012 16:24:24 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 81354 invoked by uid 500); 16 May 2012 16:24:24 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 81345 invoked by uid 99); 16 May 2012 16:24:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 May 2012 16:24:24 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of JEREMIAH.JORDAN@morningstar.com designates 64.18.2.159 as permitted sender) Received: from [64.18.2.159] (HELO exprod7og103.obsmtp.com) (64.18.2.159) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 May 2012 16:24:14 +0000 Received: from MSEXET82.morningstar.com ([216.228.224.46]) (using TLSv1) by exprod7ob103.postini.com ([64.18.6.12]) with SMTP ID DSNKT7PUmfT4roYJfIizld83pLLS1DxUdxfk@postini.com; Wed, 16 May 2012 09:23:54 PDT Received: from MSEXCHM84.morningstar.com (172.28.13.44) by MSEXET82.morningstar.com (172.28.6.46) with Microsoft SMTP Server (TLS) id 14.2.298.4; Wed, 16 May 2012 11:23:52 -0500 Received: from MSEXCHM83.morningstar.com ([fe80::9529:19c5:7200:611e]) by MSEXCHM84.morningstar.com ([fe80::a162:cbe6:e897:46fa%20]) with mapi id 14.02.0298.004; Wed, 16 May 2012 11:23:52 -0500 From: Jeremiah Jordan To: "user@cassandra.apache.org" Subject: RE: understanding of native indexes: limitations, potential side effects,... Thread-Topic: understanding of native indexes: limitations, potential side effects,... Thread-Index: AQHNM1tKrHDSyc7lckOVBHaIuRB715bMmYEj Date: Wed, 16 May 2012 16:23:51 +0000 Message-ID: <63CCA5D3F3175843B5C153AD218C2FBF08E498@MSEXCHM83.morningstar.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.28.18.112] Content-Type: multipart/alternative; boundary="_000_63CCA5D3F3175843B5C153AD218C2FBF08E498MSEXCHM83mornings_" MIME-Version: 1.0 --_000_63CCA5D3F3175843B5C153AD218C2FBF08E498MSEXCHM83mornings_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable The limitation is because number of columns could be equal to number of row= s. If number of rows is large this can become an issue. -Jeremiah ________________________________ From: David Vanderfeesten [feestend@gmail.com] Sent: Wednesday, May 16, 2012 6:58 AM To: user@cassandra.apache.org Subject: understanding of native indexes: limitations, potential side effec= ts,... Hi I like to better understand the limitations of native indexes, potential si= de effects and scenarios where they are required. My understanding so far : - Is that indexes on each node are storing indexes for data locally on the = node itself. - Indexes do not return values in a sorted way (hashes of the indexed row k= eys are defining the order) - Given by the design referred in the first bullet, a coordinator node rece= iving a read of a native index, needs to spawn a read to multiple nodes(set= of nodes together covering at least the complete key space + potentially m= ore to assure read consistency level). - Each write to an indexed column leads to an additional local read of the = index to update the index (kind of obvious but easily forgotten when tuning= your system for write-only workload) - When using a where clause in CQL you need at least to specify an equal co= ndition on a native indexed column. Additional conditions in the where clau= se are filtered out by the coordinator node receiving the CQL query. - native indexes do not support very well columns with high number of discr= ete values throughout the entire CF. Is upper understanding correct and complete? Some doubts: - about the limitation of indexing columns with high number of discrete val= ues: I assume native indexes are implemented with an internally managed CF per = index. With high cardinality values, in worst case, the number of rows in t= he index are identical to the number of rows of the indexed CF. Or are ther= e other reasons for the limitation, and if that's the case, is there a guid= eline on the max. nbr of cardinality that is still reasonable? -Are column updates and the update of the indexes (read + write action) ato= mic and isolated from concurrent updates? Txs! David --_000_63CCA5D3F3175843B5C153AD218C2FBF08E498MSEXCHM83mornings_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
The limitation is because number of columns could be equal to number of row= s.  If number of rows is large this can become an issue.

-Jeremiah

From: David Vanderfeesten [feestend@gmail.= com]
Sent: Wednesday, May 16, 2012 6:58 AM
To: user@cassandra.apache.org
Subject: understanding of native indexes: limitations, potential sid= e effects,...

Hi

I like to better understand the limitations of native indexes, potential si= de effects and scenarios where they are required.

My understanding so far :
- Is that indexes on each node are storing indexes for data locally on the = node itself.
- Indexes do not return values in a sorted way (hashes of the indexed row k= eys are defining the order)
- Given by the design referred in the first bullet, a coordinator node rece= iving a read of a native index, needs to spawn a read to multiple nodes(set= of nodes together covering at least the complete key space + potential= ly more to assure read consistency level).
- Each write to an indexed column leads to an additional local read of the = index to update the index (kind of obvious but easily forgotten when tuning= your system for write-only workload)

- When using a where clause in CQL you need at least to specify an equal co= ndition on a native indexed column. Additional conditions in the where clau= se are filtered out by the coordinator node receiving the CQL query.
- native indexes do not support very well columns with high number of discr= ete values throughout the entire CF.

Is upper understanding correct and complete?
Some doubts:
- about the limitation of indexing columns with high number of discrete val= ues:
I assume native indexes  are implemented with an internally managed CF= per index. With high cardinality values, in worst case, the number of rows= in the index are identical to the number of rows of the indexed CF. Or are= there other reasons for the limitation, and if that's the case, is there a guideline on the max= . nbr of cardinality that is still reasonable?
-Are column updates and the update of the indexes (read + write action)= atomic and isolated from concurrent updates?

Txs!

David




--_000_63CCA5D3F3175843B5C153AD218C2FBF08E498MSEXCHM83mornings_--