incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <>
Subject Re: Update vs Delete/Insert
Date Wed, 16 Jun 2010 12:05:57 GMT
It may make sense to use a secondary index for the counts. You could store the counts in both
places and use a batch mutation to update them. It does not give you a transaction guarantee,
but it will mean you still make one request to Cassandra. 


<lhid> {
	<rhid1> : <count>
        <rhid2> : <count>

The secondary index can be in the same CF, with a tweak to the key.

<lhis.count_index> {
	<count> : <rhid1>, 
	<count> : <rhid2> 

Are the counts going to be unique? If not you may want to store the secondary index in a super
CF, were the super colum name is the count and the columns in that are the id's that have
that count. 


On 16 Jun 2010, at 22:00, Dr. Martin Grabm├╝ller wrote:

> Hi Colin, 
>> From: Colin Vipurs [] 
> [...]
>> I've got some data that I'm doing counts on, stored in a CF as:
>> <lhid> {
>>    <rhid1> : <count>
>>    <rhid2> : <count>
>>    ....
>> }
> [...]
>> <lhid> {
>>   <count-rhid1> : PLACEHOLDER
>>   <count-rhid2> : PLACEHOLDER
>> }
>> would be a better way of storing the data? Does anyone know the
>> relative performance differences between doing the insert in the first
>> instance and a delete/insert in the second?
> I can't say anything about perfomance differences, but I think it will
> not matter, as you are about to insert the same amount of data.
> Just keep the following in mind:
> - With the second scheme, it is more difficult to delete individual columns,
>  because you have to know the count and the name to construct the column
>  name.  You can iterate over the columns to find the names, of course, but
>  this may or may not work for you.
>  Maybe you want to store the rhids instead of the placeholders to solve
>  that problem.
> - You will need to left-pad the counts with zeros so that lexicographical
>  ordering works.
> - (may be irrelevant, but anyway) there is a limit on column names which
>  AFAIK is lower than the limit on column values.
> Cheers,
>  Martin

View raw message