incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aditya Narayan <ady...@gmail.com>
Subject Re: Confused about get_slice SliceRange behavior with bloom filter
Date Sun, 13 Feb 2011 21:19:35 GMT
Jonathan,
If I ask for around 150-200 columns (totally random not sequential) from a
very wide row that contains more than a million or even more columns then,
is the read performance of the SliceQuery operation affected by or "depends
on the length of the row" ?? (For my use case, I would use the column names
list for this SliceQuery operation).


Thanks
Aditya

On Sun, Feb 13, 2011 at 8:41 PM, Jonathan Ellis <jbellis@gmail.com> wrote:

> On Sun, Feb 13, 2011 at 12:37 AM, E S <tr1sklion@yahoo.com> wrote:
> > I've gotten myself really confused by
> > http://wiki.apache.org/cassandra/ArchitectureInternals and am hoping
> someone can
> > help me understand what the io behavior of this operation would be.
> >
> > When I do a get_slice for a column range, will it seek to every SSTable?
>  I had
> > thought that it would use the bloom filter on the row key so that it
> would only
> > do a seek to SSTables that have a very high probability of containing
> columns
> > for that row.
>
> Yes.
>
> > In the linked doc above, it seems to say that it is only used for
> > exact column names.  Am I misunderstanding this?
>
> Yes.  You may be confusing multi-row behavior with multi-column.
>
> > On a related note, if instead of using a SliceRange I provide an explicit
> list
> > of columns, will I have to read all SSTables that have values for the
> columns
>
> Yes.
>
> > or is it smart enough to stop after finding a value from the most recent
> > SSTable?
>
> There is no way to know which value is most recent without having to
> read it first.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Mime
View raw message