Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E1C82DE79 for ; Tue, 18 Sep 2012 08:44:58 +0000 (UTC) Received: (qmail 88221 invoked by uid 500); 18 Sep 2012 08:44:56 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 88194 invoked by uid 500); 18 Sep 2012 08:44:56 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 88153 invoked by uid 99); 18 Sep 2012 08:44:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Sep 2012 08:44:55 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a59.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Sep 2012 08:44:48 +0000 Received: from homiemail-a59.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a59.g.dreamhost.com (Postfix) with ESMTP id B327D56405C for ; Tue, 18 Sep 2012 01:44:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=+UNfAOPjcRRwR+9+h0bws8rqwA Y=; b=BeCclEKlkPSC6D1rgDE86QKEb1C6tr455UQ9zfHkeeXAkoSTlBQOj8061s w+HmvWwyalXu76WwbrlWq/mEPb9ZvpMLSvaeq1PB0/JSacyWLO71/mel9J9RKwd0 Cx7aLSbjpQV/JVqUw//1dJKxK+7aJ5FXewMPlxq+sRbZD6wFo= Received: from [172.16.1.10] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a59.g.dreamhost.com (Postfix) with ESMTPSA id 10DC1564057 for ; Tue, 18 Sep 2012 01:44:25 -0700 (PDT) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_4A248262-38F5-4860-A4B5-F1DB7A9B29F2" Message-Id: <83DE17DE-676E-4E5C-A388-C1C99618089E@thelastpickle.com> Mime-Version: 1.0 (Mac OS X Mail 6.0 \(1486\)) Subject: Re: Composite Column Types Storage Date: Tue, 18 Sep 2012 20:44:23 +1200 References: To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1486) --Apple-Mail=_4A248262-38F5-4860-A4B5-F1DB7A9B29F2 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 > It is slowly dawning on me that I need a super-column to use column = blooms effectively and at the same time don't want the entire sub-column = list deserialized.=20 Queries by name use the row level bloom filter, regardless of the CF = type.=20 > In fact, for my use-case I also do not need a column sampling index. = Rather I would much prefer a multi-level skip-list Are you thinking about performance or functionality ? If it's = performance do you have an example of something that needs optimisation = ? > Is there a way to customize how cassandra writes/reads it's key/column = indexes to SSTables. No. Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 18/09/2012, at 2:44 AM, Ravikumar Govindarajan = wrote: > Yes Aaron, I was not clear about Bloom Filters. I was thinking about = the column bloom filters when I specify an absolute value for Part1 of = the composite column and a start/end value for Part2 of the composite = column >=20 > It is slowly dawning on me that I need a super-column to use column = blooms effectively and at the same time don't want the entire sub-column = list deserialized.=20 >=20 > In fact, for my use-case I also do not need a column sampling index. = Rather I would much prefer a multi-level skip-list >=20 > Is there a way to customize how cassandra writes/reads it's key/column = indexes to SSTables. Any hooks/API that is available as of now should be = greatly helpful >=20 > On Fri, Sep 14, 2012 at 10:33 AM, aaron morton = wrote: >> Range queries do not use bloom filters.=20 > Are you talking about row range queries ? Or a slice of columns in a = row ?=20 >=20 > If you are getting a slice of columns from a single row, a bloom = filter is used to locate the row.=20 > If you are getting a slice of columns from a range of rows, the bloom = filter is used to locate the first row. After that is a scan.=20 >=20 > There are also row level bloom filters for columns on a row. These are = used when you columns by names. If you are doing a slice with a start = the bloom filter is not used, instead the row level column index is used = (if present).=20 >=20 > Hope that helps.=20 >=20 >=20 > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com >=20 > On 13/09/2012, at 2:30 AM, Ravikumar Govindarajan = wrote: >=20 >> Thanks for the clarification. Even though compression solves disk = space issue, we might still have Memtable bloat right? >>=20 >> There is another issue to be handled for us. The queries are always = going to be range queries with absolute match on part1 and range on part = 2 of the composite columns >>=20 >> Ex: Query =20 >>=20 >> Range queries do not use bloom filters. It holds good for = composite-columns also right? I believe I will end up writing BF bytes = only to skip it later. >>=20 >> If sharing had been possible, then alone could have = gone into the bloom-filter, speeding up my queries really effectively. >>=20 >> But as I understand, there are many levels of nesting possible in a = composite type and casing at every level is a big task >>=20 >> May be casing for the top-level or the first-part should be a good = start? >>=20 >> -- >> Ravi >>=20 >> On Wed, Sep 12, 2012 at 5:46 PM, Sylvain Lebresne = wrote: >> > Is every / combination stored separately in disk >>=20 >> Yes, each combination is stored separately on disk (the storage = engine >> itself doesn't have special casing for composite column, at least not >> yet). But as far as disk space is concerned, I suspect that sstable >> compression makes this largely a non issue. >>=20 >> -- >> Sylvain >>=20 >=20 >=20 --Apple-Mail=_4A248262-38F5-4860-A4B5-F1DB7A9B29F2 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 It is slowly dawning on me that I need a = super-column to use column blooms effectively and at the same time don't = want the entire sub-column list = deserialized. 
Queries by name use the row level = bloom filter, regardless of the CF = type. 

In fact, for = my use-case I also do not need a column sampling index. Rather I would = much prefer a multi-level skip-list
Are you thinking = about performance or functionality ? If it's performance do you have an = example of something that needs optimisation = ?

Is there a way to = customize how cassandra writes/reads it's key/column indexes to = SSTables.
No.

Cheers

http://www.thelastpickle.com

On 18/09/2012, at 2:44 AM, Ravikumar Govindarajan <ravikumar.govindarajan@gm= ail.com> wrote:

Yes Aaron, = I was not clear about Bloom Filters. I was thinking about the column = bloom filters when I specify an absolute value for Part1 of the = composite column and a start/end value for Part2 of the composite = column

It is slowly dawning on me that I need a super-column to use = column blooms effectively and at the same time don't want the entire = sub-column list deserialized. 

In fact, = for my use-case I also do not need a column sampling index. Rather I = would much prefer a multi-level skip-list

Is there a way to customize how cassandra = writes/reads it's key/column indexes to SSTables. Any hooks/API that is = available as of now should be greatly helpful

On Fri, Sep 14, 2012 at 10:33 AM, aaron morton <aaron@thelastpickle.com> = wrote:
Range queries do not use bloom = filters. 
Are you talking about row range queries = ? Or a slice of columns in a row ? 

If you are getting a slice of columns from a single row, a bloom = filter is used to locate the row. 
If you are getting a = slice of columns from a range of rows, the bloom filter is used to = locate the first row. After that is a scan. 

There are also row level bloom filters for columns = on a row. These are used when you columns by names. If you are doing a = slice with a start the bloom filter is not used, instead the row level = column index is used (if present). 

Hope that = helps. 


-----------------
Aaron Morton
Freelance = Developer
@aaronmorton

On 13/09/2012, at 2:30 AM, Ravikumar Govindarajan <ravikumar.govindarajan@gmail.com> = wrote:

Thanks for the clarification. = Even though compression solves disk space issue, we might still have = Memtable bloat right?

There is another issue to be handled for us. The queries = are always going to be range queries with absolute match on part1 and = range on part 2 of the composite columns

Ex: Query <some-key> <Column-part-1> = <Start-Id-part-2> = <Limit> 

Range queries do not = use bloom filters. It holds good for composite-columns also right? I = believe I will end up writing BF bytes only to skip it later.

If sharing had been possible, then = <Column-part-1> alone could have gone into the bloom-filter, = speeding up my queries really effectively.

But = as I understand, there are many levels of nesting possible in a = composite type and casing at every level is a big task

May be casing for the top-level or the first-part = should be a good = start?

--
Ravi

On Wed, Sep 12, 2012 at 5:46 PM, Sylvain Lebresne = <sylvain@datastax.com> wrote:
> Is every = <string>/<id> combination stored separately in disk

Yes, each combination is stored separately on disk (the storage = engine
itself doesn't have special casing for composite column, at least = not
yet). But as far as disk space is concerned, I suspect that sstable
compression makes this largely a non issue.

--
Sylvain

=



= --Apple-Mail=_4A248262-38F5-4860-A4B5-F1DB7A9B29F2--