incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Malone <>
Subject Re: Range scan performance in 0.6.0 beta2
Date Mon, 29 Mar 2010 15:00:41 GMT
On Mon, Mar 29, 2010 at 7:13 AM, Henrik Schröder <> wrote:

> On Mon, Mar 29, 2010 at 14:15, Jonathan Ellis <> wrote:
>> On Mon, Mar 29, 2010 at 4:06 AM, Henrik Schröder <>
>> wrote:
>> > On Fri, Mar 26, 2010 at 14:47, Jonathan Ellis <>
>> wrote:
>> >> It's a unique index then?  And you're trying to read things ordered by
>> >> the index, not just "give me keys with that have a column with this
>> >> value?"
>> >
>> > Yes, because if we have more than one column per row, there's no way of
>> > (easily) limiting the result.
>> That's exactly what the count parameter of SliceRange is for... ?
> I thought that only limited the number of columns per key?
> We're using the get_range_slices method, which takes both a SlicePredicate
> (which contains a range, which contains a count) and a KeyRange (which also
> contains a count). Say that we have a bunch of keys that each contain 10
> columns, and we do a get_range_slices over those, how do we get the first 25
> columns? If we put it in the SliceRange count, we'll get all matching rows,
> and the 25 first columns of each, right? And if we put it in the KeyRange
> count, we'll get the 25 first rows that match, and all their columns, right?
> But if we have only one column per row, then we can limit the results the
> way we want to. Or have we misunderstood the api somehow?

We've run into the same issue and have a patch that limits the _total_
number of columns returned instead of limiting on number of rows / number of
columns per row. This makes it convenient to do a two dimensional index -
first key is the row key, second is the column name, column value is the
thing you're indexing. Then you do a get_range_slice on the two keys,
limiting on total columns returned.

We haven't run any real performance metrics yet. I don't think this query is
particularly performant, but it's certainly faster than doing the same
operation on the client side.

Another thing to keep in mind is that rows must fit in memory because
they're serialized / deserialized into memory from time to time. I believe
this happens during SSTable serialization. Feel free to verify/correct me on

If people are interested I can probably get that patch pushed back upstream
soon. We're in crunch mode right now for launch though so, unfortunately,
it'll probably be a week or so before we can finish it up and properly vet


View raw message