incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Lebresne <sylv...@datastax.com>
Subject Re: Confused about get_slice SliceRange behavior with bloom filter
Date Mon, 14 Feb 2011 08:36:31 GMT
As said by aaron, if the whole row is under 64k, it won't matter. But since
you spoke of very wide row, I'll assume the whole will be much more than
64k.

If so, the row is indexed by block (of 64k, configurable). Then the read
performance depends on how many of those block are needed for the query,
since each block potentially means a seek (potentially because some block
could happen to be sequential on disk). So if the columns you ask for are
really randomly distributed, then yes, the biggest the row is, the biggest
the chance is to have to hit many blocks and the biggest the chance is for
these block to be far apart on disk.

--
Sylvain

On Sun, Feb 13, 2011 at 10:19 PM, Aditya Narayan <adynnn@gmail.com> wrote:

> Jonathan,
> If I ask for around 150-200 columns (totally random not sequential) from a
> very wide row that contains more than a million or even more columns then,
> is the read performance of the SliceQuery operation affected by or "depends
> on the length of the row" ?? (For my use case, I would use the column names
> list for this SliceQuery operation).
>
>
> Thanks
> Aditya
>
>
> On Sun, Feb 13, 2011 at 8:41 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>
>> On Sun, Feb 13, 2011 at 12:37 AM, E S <tr1sklion@yahoo.com> wrote:
>> > I've gotten myself really confused by
>> > http://wiki.apache.org/cassandra/ArchitectureInternals and am hoping
>> someone can
>> > help me understand what the io behavior of this operation would be.
>> >
>> > When I do a get_slice for a column range, will it seek to every SSTable?
>>  I had
>> > thought that it would use the bloom filter on the row key so that it
>> would only
>> > do a seek to SSTables that have a very high probability of containing
>> columns
>> > for that row.
>>
>> Yes.
>>
>> > In the linked doc above, it seems to say that it is only used for
>> > exact column names.  Am I misunderstanding this?
>>
>> Yes.  You may be confusing multi-row behavior with multi-column.
>>
>> > On a related note, if instead of using a SliceRange I provide an
>> explicit list
>> > of columns, will I have to read all SSTables that have values for the
>> columns
>>
>> Yes.
>>
>> > or is it smart enough to stop after finding a value from the most recent
>> > SSTable?
>>
>> There is no way to know which value is most recent without having to
>> read it first.
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>>
>
>

Mime
View raw message