cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <>
Subject Re: please help with multiget
Date Tue, 18 Jan 2011 22:30:50 GMT
On Tue, Jan 18, 2011 at 4:29 PM, Shu Zhang <> wrote:
> Well, I don't think what I'm describing is complicated semantics. I think I've described
general batch operation design and something that is symmetrical the batch_mutate method already
on the Cassandra API. You are right, I can solve the problem with further denormalization,
and the approach of making individual gets in parallel as described by Brandon will work too.
I'll be doing one of these for now. But I think neither is as efficient, and I guess I'm still
not sure why the multiget is designed the way it is.
> The problem with denormalization is you gotta make multiple row writes in place of one,
adding load to the server, adding required physical space and losing atomicity on write operations.
I know writes are cheap in cassandra, and you can catch failed writes and retry so these problems
are not major, but it still seems clear that having a batch-get that works appropriately is
a least a little better...
> ________________________________________
> From: Aaron Morton []
> Sent: Tuesday, January 18, 2011 12:55 PM
> To:
> Subject: Re: please help with multiget
> I think the general approach is to denormalise data to remove the need for complicated
semantics when reading.
> Aaron
> On 19/01/2011, at 7:57 AM, Shu Zhang <> wrote:
>> Well, maybe making a batch-get is not  anymore efficient on the server side but
without it, you can get bottlenecked on client-server connections and client resources. If
the number of requests you want to batch is on the order of connections in your pool, then
yes, making gets in parallel is as good or maybe better. But what if you want to batch thousands
of requests?
>> The server I can scale out, I would want to get my requests there without needing
to wait for connections on my client to free up.
>> I just don't really understand the reasoning for designing muliget_slice the way
it is. I still think if you're gonna have a batch-get request (multiget_slice), you should
be able to add to the batch a reasonable number of ANY corresponding non-batch get requests.
And you can't do that... Plus, it's not symmetrical to the batch-mutate. Is there a good reason
for that?
>> ________________________________________
>> From: Brandon Williams []
>> Sent: Monday, January 17, 2011 5:09 PM
>> To:
>> Cc:
>> Subject: Re: please help with multiget
>> On Mon, Jan 17, 2011 at 6:53 PM, Shu Zhang <<>>
>> Here's the method declaration for quick reference:
>> map<string,list<ColumnOrSuperColumn>> multiget_slice(string keyspace,
list<string> keys, ColumnParent column_parent, SlicePredicate predicate, ConsistencyLevel
>> It looks like you must have the same SlicePredicate for every key in your batch retrieval,
so what are you suppose to do when you need to retrieve different columns for different keys?
>> Issue multiple gets in parallel yourself.  Keep in mind that multiget is not an
optimization, in fact, it can work against you when one key exceeds the rpc timeout, because
you get nothing back.
>> -Brandon

muliget_slice is very useful I IMHO. In my testing, the roundtrip time
for 1000 get requests all being acked individually is much higher then
rountrip time for 200 get_slice grouped 5 at a time. For anyone that
needs that type of access they are in good shape.

I was also theorizing that a CF using RowCache with very, very high
read rate would benefit from "pooling" a bunch of reads together with

I do agree that the first time I looked at the multi_get_slice
signature I realized I could do many of the things I was expecting
from a multi-get.

View raw message