Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of JEREMIAH.JORDAN@morningstar.com
 designates 64.18.2.159 as permitted sender)
From: Jeremiah Jordan <JEREMIAH.JORDAN@morningstar.com>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: RE: understanding of native indexes: limitations, potential side
 effects,...
Thread-Topic: understanding of native indexes: limitations, potential side
 effects,...
Thread-Index: AQHNM1tKrHDSyc7lckOVBHaIuRB715bMmYEj
Date: Wed, 16 May 2012 16:23:51 +0000
Message-ID: <63CCA5D3F3175843B5C153AD218C2FBF08E498@MSEXCHM83.morningstar.com>
References: 
 <CAPjXCuw7+maeHmf4aQ93thdy3xPs4JbKE6VKFomiNChV4NQZ8Q@mail.gmail.com>
In-Reply-To: 
 <CAPjXCuw7+maeHmf4aQ93thdy3xPs4JbKE6VKFomiNChV4NQZ8Q@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
Content-Type: multipart/alternative;
	boundary="_000_63CCA5D3F3175843B5C153AD218C2FBF08E498MSEXCHM83mornings_"
MIME-Version: 1.0

--_000_63CCA5D3F3175843B5C153AD218C2FBF08E498MSEXCHM83mornings_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

The limitation is because number of columns could be equal to number of row=
s.  If number of rows is large this can become an issue.

-Jeremiah

________________________________
From: David Vanderfeesten [feestend@gmail.com]
Sent: Wednesday, May 16, 2012 6:58 AM
To: user@cassandra.apache.org
Subject: understanding of native indexes: limitations, potential side effec=
ts,...

Hi

I like to better understand the limitations of native indexes, potential si=
de effects and scenarios where they are required.

My understanding so far :
- Is that indexes on each node are storing indexes for data locally on the =
node itself.
- Indexes do not return values in a sorted way (hashes of the indexed row k=
eys are defining the order)
- Given by the design referred in the first bullet, a coordinator node rece=
iving a read of a native index, needs to spawn a read to multiple nodes(set=
 of nodes together covering at least the complete key space + potentially m=
ore to assure read consistency level).
- Each write to an indexed column leads to an additional local read of the =
index to update the index (kind of obvious but easily forgotten when tuning=
 your system for write-only workload)
- When using a where clause in CQL you need at least to specify an equal co=
ndition on a native indexed column. Additional conditions in the where clau=
se are filtered out by the coordinator node receiving the CQL query.
- native indexes do not support very well columns with high number of discr=
ete values throughout the entire CF.

Is upper understanding correct and complete?
Some doubts:
- about the limitation of indexing columns with high number of discrete val=
ues:
I assume native indexes  are implemented with an internally managed CF per =
index. With high cardinality values, in worst case, the number of rows in t=
he index are identical to the number of rows of the indexed CF. Or are ther=
e other reasons for the limitation, and if that's the case, is there a guid=
eline on the max. nbr of cardinality that is still reasonable?
-Are column updates and the update of the indexes (read + write action) ato=
mic and isolated from concurrent updates?

Txs!

David


--_000_63CCA5D3F3175843B5C153AD218C2FBF08E498MSEXCHM83mornings_
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<html dir=3D"ltr">
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-=
1">
<style id=3D"owaParaStyle" type=3D"text/css">P {margin-top:0;margin-bottom:=
0;}</style>
</head>
<body ocsi=3D"0" fpstyle=3D"1">
<div style=3D"direction: ltr;font-family: Helvetica;color: #000000;font-siz=
e: 10pt;">
The limitation is because number of columns could be equal to number of row=
s.&nbsp; If number of rows is large this can become an issue.<br>
<br>
-Jeremiah<br>
<br>
<div style=3D"font-family: Times New Roman; color: #000000; font-size: 16px=
">
<hr tabindex=3D"-1">
<div style=3D"direction: ltr;" id=3D"divRpF77918"><font color=3D"#000000" f=
ace=3D"Tahoma" size=3D"2"><b>From:</b> David Vanderfeesten [feestend@gmail.=
com]<br>
<b>Sent:</b> Wednesday, May 16, 2012 6:58 AM<br>
<b>To:</b> user@cassandra.apache.org<br>
<b>Subject:</b> understanding of native indexes: limitations, potential sid=
e effects,...<br>
</font><br>
</div>
<div></div>
<div>Hi<br>
<br>
I like to better understand the limitations of native indexes, potential si=
de effects and scenarios where they are required.<br>
<br>
<span class=3D"st">My understanding so far :<br>
- Is that indexes on each node are storing indexes for data locally on the =
node itself.<br>
- Indexes do not return values in a sorted way (hashes of the indexed row k=
eys are defining the order)<br>
- Given by the design referred in the first bullet, a coordinator node rece=
iving a read of a native index, needs to spawn a read to multiple nodes(set=
 of nodes together covering at least the complete key space &#43; potential=
ly more to assure read consistency level).
<br>
- Each write to an indexed column leads to an additional local read of the =
index to update the index (kind of obvious but easily forgotten when tuning=
 your system for write-only workload)</span><br>
- When using a where clause in CQL you need at least to specify an equal co=
ndition on a native indexed column. Additional conditions in the where clau=
se are filtered out by the coordinator node receiving the CQL query.<br>
- native indexes do not support very well columns with high number of discr=
ete values throughout the entire CF.<br>
<br>
Is upper understanding correct and complete? <br>
Some doubts: <br>
- about the limitation of indexing columns with high number of discrete val=
ues: <br>
I assume native indexes&nbsp; are implemented with an internally managed CF=
 per index. With high cardinality values, in worst case, the number of rows=
 in the index are identical to the number of rows of the indexed CF. Or are=
 there other reasons for the limitation,
 and if that's the case, <span class=3D"st">is there a guideline on the max=
. nbr of cardinality that is still reasonable?
</span><br>
-Are column updates and the update of the indexes (read &#43; write action)=
 atomic and isolated from concurrent updates?
<br>
<br>
<span class=3D"st">Txs!<br>
<br>
David<br>
<br>
<br>
<br>
<br>
</span></div>
</div>
</div>
</body>
</html>

--_000_63CCA5D3F3175843B5C153AD218C2FBF08E498MSEXCHM83mornings_--