cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <>
Subject Re: Reading whole row vs a range of columns (pycassa)
Date Mon, 21 Mar 2011 00:12:22 GMT
Internally a multiget just turned into a series of single row gets. There is no seek and partial
scan such as you may see when reading from the clustered index in a RDBMS. 

Unless you have a performance problem and you've tried other things I'd put this idea of the
back burner. There are many other factors that impact read performance, and OOP requires a
lot more care than RP.

On 21 Mar 2011, at 11:36, buddhasystem wrote:

> Aaron, thanks for chiming in.
> I'm doing what you said, i.e. all data for a single object (which is quite
> lean with about 100 attributes 10 bytes each) just goes into a single
> column, as opposed to the previous version of my application, which had all
> attributes of each small object mapped to individual columns.
> So yes, I perhaps considered having 100 objects in a single column but that
> is suboptimal for many reasons (hard to add object later).
> My reference to OOP was this -- if I was sticking with the original design,
> it could have been advantageous to have OOP since statistically it's likely
> that requests for objects are often serial, e.g. often people don't query
> for just one object with id=123, but for a series like id=[123..145]. If I
> bunch these into rows containing 100 objects each, that promises some
> efficiency right there, as I read one row as opposed to say 50.
> aaron morton wrote:
>> I'd collapse all the data for a single object into a single column, not
>> sure about storing 100 objects in a single column though. 
>> Have you considered any concurrency issues ? e.g. multiple threads /
>> processes wanting to update different objects in the same group of 100? 
>> Dont understand your reference to the OOP in the context of a reading 100
>> columns from a row. 
>> Aaron
>> On 19 Mar 2011, at 16:22, buddhasystem wrote:
>> &gt; As I'm working on this further, I want to understand this:
>> &gt; 
>> &gt; Is it advantageous to flatten data in blocks (strings) each
>> containing a
>> &gt; series of objects, if I know that a serial object read is often
>> likely, but
>> &gt; don't want to resort to OPP? I worked out the optimal granularity, it
>> seems.
>> &gt; Is it better to read a serialized single column with 100 objects than
>> a row
>> &gt; consisting of a hundred columns each modeling an object?
>> &gt; 
>> &gt; --
>> &gt; View this message in context:
>> &gt; Sent from the mailing list
>> archive at
> --
> View this message in context:
> Sent from the mailing list archive at

View raw message