incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Composite Column Types Storage
Date Tue, 18 Sep 2012 08:44:23 GMT
> It is slowly dawning on me that I need a super-column to use column blooms effectively
and at the same time don't want the entire sub-column list deserialized. 
Queries by name use the row level bloom filter, regardless of the CF type. 

> In fact, for my use-case I also do not need a column sampling index. Rather I would much
prefer a multi-level skip-list
Are you thinking about performance or functionality ? If it's performance do you have an example
of something that needs optimisation ?

> Is there a way to customize how cassandra writes/reads it's key/column indexes to SSTables.
No.

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 18/09/2012, at 2:44 AM, Ravikumar Govindarajan <ravikumar.govindarajan@gmail.com>
wrote:

> Yes Aaron, I was not clear about Bloom Filters. I was thinking about the column bloom
filters when I specify an absolute value for Part1 of the composite column and a start/end
value for Part2 of the composite column
> 
> It is slowly dawning on me that I need a super-column to use column blooms effectively
and at the same time don't want the entire sub-column list deserialized. 
> 
> In fact, for my use-case I also do not need a column sampling index. Rather I would much
prefer a multi-level skip-list
> 
> Is there a way to customize how cassandra writes/reads it's key/column indexes to SSTables.
Any hooks/API that is available as of now should be greatly helpful
> 
> On Fri, Sep 14, 2012 at 10:33 AM, aaron morton <aaron@thelastpickle.com> wrote:
>> Range queries do not use bloom filters. 
> Are you talking about row range queries ? Or a slice of columns in a row ? 
> 
> If you are getting a slice of columns from a single row, a bloom filter is used to locate
the row. 
> If you are getting a slice of columns from a range of rows, the bloom filter is used
to locate the first row. After that is a scan. 
> 
> There are also row level bloom filters for columns on a row. These are used when you
columns by names. If you are doing a slice with a start the bloom filter is not used, instead
the row level column index is used (if present). 
> 
> Hope that helps. 
> 
> 
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 13/09/2012, at 2:30 AM, Ravikumar Govindarajan <ravikumar.govindarajan@gmail.com>
wrote:
> 
>> Thanks for the clarification. Even though compression solves disk space issue, we
might still have Memtable bloat right?
>> 
>> There is another issue to be handled for us. The queries are always going to be range
queries with absolute match on part1 and range on part 2 of the composite columns
>> 
>> Ex: Query <some-key> <Column-part-1> <Start-Id-part-2> <Limit>

>> 
>> Range queries do not use bloom filters. It holds good for composite-columns also
right? I believe I will end up writing BF bytes only to skip it later.
>> 
>> If sharing had been possible, then <Column-part-1> alone could have gone into
the bloom-filter, speeding up my queries really effectively.
>> 
>> But as I understand, there are many levels of nesting possible in a composite type
and casing at every level is a big task
>> 
>> May be casing for the top-level or the first-part should be a good start?
>> 
>> --
>> Ravi
>> 
>> On Wed, Sep 12, 2012 at 5:46 PM, Sylvain Lebresne <sylvain@datastax.com> wrote:
>> > Is every <string>/<id> combination stored separately in disk
>> 
>> Yes, each combination is stored separately on disk (the storage engine
>> itself doesn't have special casing for composite column, at least not
>> yet). But as far as disk space is concerned, I suspect that sstable
>> compression makes this largely a non issue.
>> 
>> --
>> Sylvain
>> 
> 
> 


Mime
View raw message