Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: local policy)
From: aaron morton <aaron@thelastpickle.com>
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_4A248262-38F5-4860-A4B5-F1DB7A9B29F2"
Message-Id: <83DE17DE-676E-4E5C-A388-C1C99618089E@thelastpickle.com>
Mime-Version: 1.0 (Mac OS X Mail 6.0 \(1486\))
Subject: Re: Composite Column Types Storage
Date: Tue, 18 Sep 2012 20:44:23 +1200
References: 
 <CAGW2whSNNmMMP69FtqphJdZLdQAvfGdLg4WFSCan30wynPuKtg@mail.gmail.com>
 <CAKkz8Q2WJobpHeY-pp6JEaYhZ-8v0eZkK+jbJYdi1yuEg1oUNA@mail.gmail.com>
 <CAGW2whResKvaotLmzm99eLPuPyO7AFqaMyQ7du=ph80f+aDWLg@mail.gmail.com>
 <A333664D-CB55-4C42-BD20-890EF428B384@thelastpickle.com>
 <CAGW2whRAAbpXrQ1hAR255avyzHnGUBBC1O=k3e+h-O+aLdV2xA@mail.gmail.com>
To: user@cassandra.apache.org
In-Reply-To: 
 <CAGW2whRAAbpXrQ1hAR255avyzHnGUBBC1O=k3e+h-O+aLdV2xA@mail.gmail.com>


--Apple-Mail=_4A248262-38F5-4860-A4B5-F1DB7A9B29F2
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=iso-8859-1

> It is slowly dawning on me that I need a super-column to use column =
blooms effectively and at the same time don't want the entire sub-column =
list deserialized.=20
Queries by name use the row level bloom filter, regardless of the CF =
type.=20

> In fact, for my use-case I also do not need a column sampling index. =
Rather I would much prefer a multi-level skip-list
Are you thinking about performance or functionality ? If it's =
performance do you have an example of something that needs optimisation =
?

> Is there a way to customize how cassandra writes/reads it's key/column =
indexes to SSTables.
No.

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 18/09/2012, at 2:44 AM, Ravikumar Govindarajan =
<ravikumar.govindarajan@gmail.com> wrote:

> Yes Aaron, I was not clear about Bloom Filters. I was thinking about =
the column bloom filters when I specify an absolute value for Part1 of =
the composite column and a start/end value for Part2 of the composite =
column
>=20
> It is slowly dawning on me that I need a super-column to use column =
blooms effectively and at the same time don't want the entire sub-column =
list deserialized.=20
>=20
> In fact, for my use-case I also do not need a column sampling index. =
Rather I would much prefer a multi-level skip-list
>=20
> Is there a way to customize how cassandra writes/reads it's key/column =
indexes to SSTables. Any hooks/API that is available as of now should be =
greatly helpful
>=20
> On Fri, Sep 14, 2012 at 10:33 AM, aaron morton =
<aaron@thelastpickle.com> wrote:
>> Range queries do not use bloom filters.=20
> Are you talking about row range queries ? Or a slice of columns in a =
row ?=20
>=20
> If you are getting a slice of columns from a single row, a bloom =
filter is used to locate the row.=20
> If you are getting a slice of columns from a range of rows, the bloom =
filter is used to locate the first row. After that is a scan.=20
>=20
> There are also row level bloom filters for columns on a row. These are =
used when you columns by names. If you are doing a slice with a start =
the bloom filter is not used, instead the row level column index is used =
(if present).=20
>=20
> Hope that helps.=20
>=20
>=20
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>=20
> On 13/09/2012, at 2:30 AM, Ravikumar Govindarajan =
<ravikumar.govindarajan@gmail.com> wrote:
>=20
>> Thanks for the clarification. Even though compression solves disk =
space issue, we might still have Memtable bloat right?
>>=20
>> There is another issue to be handled for us. The queries are always =
going to be range queries with absolute match on part1 and range on part =
2 of the composite columns
>>=20
>> Ex: Query <some-key> <Column-part-1> <Start-Id-part-2> <Limit>=20
>>=20
>> Range queries do not use bloom filters. It holds good for =
composite-columns also right? I believe I will end up writing BF bytes =
only to skip it later.
>>=20
>> If sharing had been possible, then <Column-part-1> alone could have =
gone into the bloom-filter, speeding up my queries really effectively.
>>=20
>> But as I understand, there are many levels of nesting possible in a =
composite type and casing at every level is a big task
>>=20
>> May be casing for the top-level or the first-part should be a good =
start?
>>=20
>> --
>> Ravi
>>=20
>> On Wed, Sep 12, 2012 at 5:46 PM, Sylvain Lebresne =
<sylvain@datastax.com> wrote:
>> > Is every <string>/<id> combination stored separately in disk
>>=20
>> Yes, each combination is stored separately on disk (the storage =
engine
>> itself doesn't have special casing for composite column, at least not
>> yet). But as far as disk space is concerned, I suspect that sstable
>> compression makes this largely a non issue.
>>=20
>> --
>> Sylvain
>>=20
>=20
>=20


--Apple-Mail=_4A248262-38F5-4860-A4B5-F1DB7A9B29F2
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=iso-8859-1

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Diso-8859-1"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; =
"><blockquote type=3D"cite">It is slowly dawning on me that I need a =
super-column to use column blooms effectively and at the same time don't =
want the entire sub-column list =
deserialized.&nbsp;</blockquote><div>Queries by name use the row level =
bloom filter, regardless of the CF =
type.&nbsp;</div><div><br></div><blockquote type=3D"cite">In fact, for =
my use-case I also do not need a column sampling index. Rather I would =
much prefer a multi-level skip-list</blockquote><div>Are you thinking =
about performance or functionality ? If it's performance do you have an =
example of something that needs optimisation =
?</div><div><br></div><blockquote type=3D"cite">Is there a way to =
customize how cassandra writes/reads it's key/column indexes to =
SSTables.</blockquote>No.<div><br></div><div>Cheers</div><div><br><div><di=
v apple-content-edited=3D"true">
<span class=3D"Apple-style-span" style=3D"border-collapse: separate; =
color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; =
font-variant: normal; font-weight: normal; letter-spacing: normal; =
line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: =
0px; text-transform: none; white-space: normal; widows: 2; word-spacing: =
0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: =
normal; font-weight: normal; letter-spacing: normal; line-height: =
normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: =
normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: =
0px; -webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; =
"><div><div>-----------------</div><div>Aaron Morton</div><div>Freelance =
Developer</div><div>@aaronmorton</div><div><a =
href=3D"http://www.thelastpickle.com">http://www.thelastpickle.com</a></di=
v></div></div></span></div></span></div></span></span>
</div>

<br><div><div>On 18/09/2012, at 2:44 AM, Ravikumar Govindarajan &lt;<a =
href=3D"mailto:ravikumar.govindarajan@gmail.com">ravikumar.govindarajan@gm=
ail.com</a>&gt; wrote:</div><br =
class=3D"Apple-interchange-newline"><blockquote type=3D"cite">Yes Aaron, =
I was not clear about Bloom Filters. I was thinking about the column =
bloom filters when I specify an absolute value for Part1 of the =
composite column and a start/end value for Part2 of the composite =
column<div><br>
</div><div>It is slowly dawning on me that I need a super-column to use =
column blooms effectively and at the same time don't want the entire =
sub-column list deserialized.&nbsp;</div><div><br></div><div>In fact, =
for my use-case I also do not need a column sampling index. Rather I =
would much prefer a multi-level skip-list</div>
<div><br></div><div>Is there a way to customize how cassandra =
writes/reads it's key/column indexes to SSTables. Any hooks/API that is =
available as of now should be greatly helpful</div><div><br><div =
class=3D"gmail_quote">
On Fri, Sep 14, 2012 at 10:33 AM, aaron morton <span dir=3D"ltr">&lt;<a =
href=3D"mailto:aaron@thelastpickle.com" =
target=3D"_blank">aaron@thelastpickle.com</a>&gt;</span> =
wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin: 0px 0px 0px =
0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); =
border-left-style: solid; padding-left: 1ex; position: static; z-index: =
auto; ">
<div style=3D"word-wrap:break-word"><div class=3D"im"><blockquote =
type=3D"cite">Range queries do not use bloom =
filters.&nbsp;</blockquote></div>Are you talking about row range queries =
? Or a slice of columns in a row ?&nbsp;<div><br></div>
<div>If you are getting a slice of columns from a single row, a bloom =
filter is used to locate the row.&nbsp;</div><div>If you are getting a =
slice of columns from a range of rows, the bloom filter is used to =
locate the first row. After that is a scan.&nbsp;</div>
<div><br></div><div>There are also row level bloom filters for columns =
on a row. These are used when you columns by names. If you are doing a =
slice with a start the bloom filter is not used, instead the row level =
column index is used (if present).&nbsp;</div>
<div><br></div><div>Hope that =
helps.&nbsp;</div><div><br></div><div><div>
</div>
<br><div>
<span =
style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;text-al=
ign:-webkit-auto;font-style:normal;font-weight:normal;line-height:normal;b=
order-collapse:separate;text-transform:none;font-size:medium;white-space:n=
ormal;font-family:Helvetica;word-spacing:0px">
<div style=3D"word-wrap:break-word">
<span =
style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;font-st=
yle:normal;font-weight:normal;line-height:normal;border-collapse:separate;=
text-transform:none;font-size:medium;white-space:normal;font-family:Helvet=
ica;word-spacing:0px"><div style=3D"word-wrap:break-word">
<div><div>-----------------</div><div>Aaron Morton</div><div>Freelance =
Developer</div><div>@aaronmorton</div><div><a =
href=3D"http://www.thelastpickle.com/" =
target=3D"_blank">http://www.thelastpickle.com</a></div></div></div></span=
></div>
</span>
</div><div><div class=3D"h5">
<br><div><div>On 13/09/2012, at 2:30 AM, Ravikumar Govindarajan &lt;<a =
href=3D"mailto:ravikumar.govindarajan@gmail.com" =
target=3D"_blank">ravikumar.govindarajan@gmail.com</a>&gt; =
wrote:</div><br><blockquote type=3D"cite">Thanks for the clarification. =
Even though compression solves disk space issue, we might still have =
Memtable bloat right?<div>
<br></div><div>There is another issue to be handled for us. The queries =
are always going to be range queries with absolute match on part1 and =
range on part 2 of the composite columns</div>
<div><br></div><div>Ex: Query &lt;some-key&gt; &lt;Column-part-1&gt; =
&lt;Start-Id-part-2&gt; =
&lt;Limit&gt;&nbsp;</div><div><div><br></div><div>Range queries do not =
use bloom filters. It holds good for composite-columns also right? I =
believe I will end up writing BF bytes only to skip it later.</div>

<div><br></div><div>If sharing had been possible, then =
&lt;Column-part-1&gt; alone could have gone into the bloom-filter, =
speeding up my queries really effectively.</div><div><br></div><div>But =
as I understand, there are many levels of nesting possible in a =
composite type and casing at every level is a big task</div>

<div><br></div><div>May be casing for the top-level or the first-part =
should be a good =
start?</div><div><br></div><div>--</div><div>Ravi</div><div><br><div =
class=3D"gmail_quote">On Wed, Sep 12, 2012 at 5:46 PM, Sylvain Lebresne =
<span dir=3D"ltr">&lt;<a href=3D"mailto:sylvain@datastax.com" =
target=3D"_blank">sylvain@datastax.com</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex"><div>&gt; Is every =
&lt;string&gt;/&lt;id&gt; combination stored separately in disk<br>
<br>
</div>Yes, each combination is stored separately on disk (the storage =
engine<br>
itself doesn't have special casing for composite column, at least =
not<br>
yet). But as far as disk space is concerned, I suspect that sstable<br>
compression makes this largely a non issue.<br>
<br>
--<br>
Sylvain<br>
</blockquote></div><br></div></div>
=
</blockquote></div><br></div></div></div></div></blockquote></div><br></di=
v>
</blockquote></div><br></div></div></body></html>=

--Apple-Mail=_4A248262-38F5-4860-A4B5-F1DB7A9B29F2--