incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Confused about get_slice SliceRange behavior with bloom filter
Date Sun, 13 Feb 2011 23:37:00 GMT
AFAIK yes. 

Until your row is column_index_size_in_kb in size (and in some circumstances a compaction
must have run) the code has to scan through all of the columns in the row to find the 150-200
you want.

From the help in cassandra.yaml

# Add column indexes to a row after its contents reach this size.
# Increase if your column values are large, or if you have a very large
# number of columns.  The competing causes are, Cassandra has to
# deserialize this much of the row to read a single column, so you want
# it to be small - at least if you do many partial-row reads - but all
# the index data is read for each access, so you don't want to generate
# that wastefully either.
column_index_size_in_kb: 64

If you are making a lot of value less columns, it may pay work out how many it would take
before they reach this size. Note, I've not checked but I assume the column name is included
in determining the row size. 

Hope that helps.
Aaron

On 14 Feb 2011, at 10:19, Aditya Narayan wrote:

> Jonathan,
> If I ask for around 150-200 columns (totally random not sequential) from a very wide
row that contains more than a million or even more columns then, is the read performance of
the SliceQuery operation affected by or "depends on the length of the row" ?? (For my use
case, I would use the column names list for this SliceQuery operation).
> 
> 
> Thanks
> Aditya
> 
> On Sun, Feb 13, 2011 at 8:41 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
> On Sun, Feb 13, 2011 at 12:37 AM, E S <tr1sklion@yahoo.com> wrote:
> > I've gotten myself really confused by
> > http://wiki.apache.org/cassandra/ArchitectureInternals and am hoping someone can
> > help me understand what the io behavior of this operation would be.
> >
> > When I do a get_slice for a column range, will it seek to every SSTable?  I had
> > thought that it would use the bloom filter on the row key so that it would only
> > do a seek to SSTables that have a very high probability of containing columns
> > for that row.
> 
> Yes.
> 
> > In the linked doc above, it seems to say that it is only used for
> > exact column names.  Am I misunderstanding this?
> 
> Yes.  You may be confusing multi-row behavior with multi-column.
> 
> > On a related note, if instead of using a SliceRange I provide an explicit list
> > of columns, will I have to read all SSTables that have values for the columns
> 
> Yes.
> 
> > or is it smart enough to stop after finding a value from the most recent
> > SSTable?
> 
> There is no way to know which value is most recent without having to
> read it first.
> 
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
> 


Mime
View raw message