incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikumar Govindarajan <ravikumar.govindara...@gmail.com>
Subject Re: Composite Column Types Storage
Date Mon, 17 Sep 2012 14:44:44 GMT
Yes Aaron, I was not clear about Bloom Filters. I was thinking about the
column bloom filters when I specify an absolute value for Part1 of the
composite column and a start/end value for Part2 of the composite column

It is slowly dawning on me that I need a super-column to use column blooms
effectively and at the same time don't want the entire sub-column list
deserialized.

In fact, for my use-case I also do not need a column sampling index. Rather
I would much prefer a multi-level skip-list

Is there a way to customize how cassandra writes/reads it's key/column
indexes to SSTables. Any hooks/API that is available as of now should be
greatly helpful

On Fri, Sep 14, 2012 at 10:33 AM, aaron morton <aaron@thelastpickle.com>wrote:

> Range queries do not use bloom filters.
>
> Are you talking about row range queries ? Or a slice of columns in a row ?
>
> If you are getting a slice of columns from a single row, a bloom filter is
> used to locate the row.
> If you are getting a slice of columns from a range of rows, the bloom
> filter is used to locate the first row. After that is a scan.
>
> There are also row level bloom filters for columns on a row. These are
> used when you columns by names. If you are doing a slice with a start the
> bloom filter is not used, instead the row level column index is used (if
> present).
>
> Hope that helps.
>
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 13/09/2012, at 2:30 AM, Ravikumar Govindarajan <
> ravikumar.govindarajan@gmail.com> wrote:
>
> Thanks for the clarification. Even though compression solves disk space
> issue, we might still have Memtable bloat right?
>
> There is another issue to be handled for us. The queries are always going
> to be range queries with absolute match on part1 and range on part 2 of the
> composite columns
>
> Ex: Query <some-key> <Column-part-1> <Start-Id-part-2> <Limit>
>
> Range queries do not use bloom filters. It holds good for
> composite-columns also right? I believe I will end up writing BF bytes only
> to skip it later.
>
> If sharing had been possible, then <Column-part-1> alone could have gone
> into the bloom-filter, speeding up my queries really effectively.
>
> But as I understand, there are many levels of nesting possible in a
> composite type and casing at every level is a big task
>
> May be casing for the top-level or the first-part should be a good start?
>
> --
> Ravi
>
> On Wed, Sep 12, 2012 at 5:46 PM, Sylvain Lebresne <sylvain@datastax.com>wrote:
>
>> > Is every <string>/<id> combination stored separately in disk
>>
>> Yes, each combination is stored separately on disk (the storage engine
>> itself doesn't have special casing for composite column, at least not
>> yet). But as far as disk space is concerned, I suspect that sstable
>> compression makes this largely a non issue.
>>
>> --
>> Sylvain
>>
>
>
>

Mime
View raw message