incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: implementation choice with regard to multiple range slice query filters
Date Tue, 03 Apr 2012 04:58:03 GMT
That would work, but I think the best approach would actually push
multiple ranges down into ISR itself, otherwise you could waste a lot
of time reading the row header redundantly (the
skipBloomFilter/deserializeIndex part).

The tricky part would be getting IndexedBlockFetcher to not do extra
work in the case where the ranges's index blocks overlap -- in other
words, best of both worlds where we "skip ahead" when the index says
we can at the end of one range, but doing a seq scan when that is more
efficient.

(Here's where I admit that I've asked several people to implement 3885
as a technical interview problem for DataStax.  For the purposes of
that interview, this last part is optional.)

On Mon, Apr 2, 2012 at 11:19 PM, David Alves <davidralves@gmail.com> wrote:
> Hi guys
>
>        I'm a PhD student and I'm trying to dip my feet in the water wrt to cassandra
development, as I'm a long time fan.
>        I'm implementing CASSANDRA-3885 which pertains to supporting returning multiple
slices of a row.
>
>        After looking around at the portion of the code that is involved two implementation
options come to mind and I'd like to get feedback from you on whichever you think might work
best (or even if I'm in the right track).
>
>        As a first approach I simply subclassed SliceQueryFilter (setting start and
finish to firstRange.start and lastRange.finish) and made the subclass not return the elements
in between the ranges (spinning to the first element of the next range whenever the final
element of the previous was found). This approach only uses one IndexedSliceReader but it
scans from firstRange.start to lastRange.finish.
>
>        Still when I was finishing It came to mind that in cases where the filter's
selectivity is very low i.e., the ranges are a sparse selection of the total number of columns,
I might be doing a full row scan for nothing, so another option came to mind: an iterator
of iterators where I use multiple IndexedSliceReader's for each of the required slice ranges
and simply iterate though them.
>
>        Which do you think is the better option? Am I making any sense, or am I completely
off track?
>
>        Any help would be greatly appreciated.
>
> Cheers
> David Ribeiro Alves
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Mime
View raw message