incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Alves <davidral...@gmail.com>
Subject Re: implementation choice with regard to multiple range slice query filters
Date Wed, 04 Apr 2012 00:22:30 GMT
cool, thanks.

-david

On Apr 4, 2012, at 1:01 AM, Jonathan Ellis wrote:

> You need more than column_index_size_in_kb worth of column data for it
> to generate row header index entries.  We have a cassandra.yaml in
> test/conf that sets that extra low, to 4, to make that easier.  "ant
> test" sets up the environment to point to that yaml, but if you're
> running it from your IDE you might be missing that.
> 
> Assuming that's working correctly, TableTest.testGetSliceFromLarge is
> a relevant example.  In particular, note this part:
> 
>        ArrayList<IndexHelper.IndexInfo> indexes =
> IndexHelper.deserializeIndex(file);
>        assert indexes.size() > 2;
> 
> On Tue, Apr 3, 2012 at 6:23 PM, David Alves <davidralves@gmail.com> wrote:
>> Hi
>> 
>>        Jonathan: Thanks for the tip. Although the first option I proposed would not
incur in that penalty it would not take advantage of the columns index for the middle ranges.
>> 
>>        On a related matter, I'm struggling to test the IndexedBlockFetcher implementation
(SimpleBlockFetcher is working fine) as none of the tests in ColumnFamilyStoreTest seem to
use it (rowIndexEntry.columnsIndex().isEmpty() is always true in ISR). Is there an easy way
to make the columns index be built for testing?
>> 
>> Cheers
>> -david
>> 
>> On Apr 3, 2012, at 5:58 AM, Jonathan Ellis wrote:
>> 
>>> That would work, but I think the best approach would actually push
>>> multiple ranges down into ISR itself, otherwise you could waste a lot
>>> of time reading the row header redundantly (the
>>> skipBloomFilter/deserializeIndex part).
>>> 
>>> The tricky part would be getting IndexedBlockFetcher to not do extra
>>> work in the case where the ranges's index blocks overlap -- in other
>>> words, best of both worlds where we "skip ahead" when the index says
>>> we can at the end of one range, but doing a seq scan when that is more
>>> efficient.
>>> 
>>> (Here's where I admit that I've asked several people to implement 3885
>>> as a technical interview problem for DataStax.  For the purposes of
>>> that interview, this last part is optional.)
>>> 
>>> On Mon, Apr 2, 2012 at 11:19 PM, David Alves <davidralves@gmail.com> wrote:
>>>> Hi guys
>>>> 
>>>>        I'm a PhD student and I'm trying to dip my feet in the water wrt to
cassandra development, as I'm a long time fan.
>>>>        I'm implementing CASSANDRA-3885 which pertains to supporting returning
multiple slices of a row.
>>>> 
>>>>        After looking around at the portion of the code that is involved two
implementation options come to mind and I'd like to get feedback from you on whichever you
think might work best (or even if I'm in the right track).
>>>> 
>>>>        As a first approach I simply subclassed SliceQueryFilter (setting
start and finish to firstRange.start and lastRange.finish) and made the subclass not return
the elements in between the ranges (spinning to the first element of the next range whenever
the final element of the previous was found). This approach only uses one IndexedSliceReader
but it scans from firstRange.start to lastRange.finish.
>>>> 
>>>>        Still when I was finishing It came to mind that in cases where the
filter's selectivity is very low i.e., the ranges are a sparse selection of the total number
of columns, I might be doing a full row scan for nothing, so another option came to mind:
an iterator of iterators where I use multiple IndexedSliceReader's for each of the required
slice ranges and simply iterate though them.
>>>> 
>>>>        Which do you think is the better option? Am I making any sense, or
am I completely off track?
>>>> 
>>>>        Any help would be greatly appreciated.
>>>> 
>>>> Cheers
>>>> David Ribeiro Alves
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of DataStax, the source for professional Cassandra support
>>> http://www.datastax.com
>> 
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com


Mime
View raw message