incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dr. Martin Grabm├╝ller <Martin.Grabmuel...@eleven.de>
Subject RE: Update vs Delete/Insert
Date Wed, 16 Jun 2010 10:00:44 GMT
Hi Colin, 

> From: Colin Vipurs [mailto:zodiaczx6@gmail.com] 
[...]
> I've got some data that I'm doing counts on, stored in a CF as:
> 
> <lhid> {
>     <rhid1> : <count>
>     <rhid2> : <count>
>     ....
> }
[...]
> <lhid> {
>    <count-rhid1> : PLACEHOLDER
>    <count-rhid2> : PLACEHOLDER
> }
> 
> would be a better way of storing the data? Does anyone know the
> relative performance differences between doing the insert in the first
> instance and a delete/insert in the second?

I can't say anything about perfomance differences, but I think it will
not matter, as you are about to insert the same amount of data.

Just keep the following in mind:

- With the second scheme, it is more difficult to delete individual columns,
  because you have to know the count and the name to construct the column
  name.  You can iterate over the columns to find the names, of course, but
  this may or may not work for you.

  Maybe you want to store the rhids instead of the placeholders to solve
  that problem.

- You will need to left-pad the counts with zeros so that lexicographical
  ordering works.

- (may be irrelevant, but anyway) there is a limit on column names which
  AFAIK is lower than the limit on column values.

Cheers,
  Martin

Mime
View raw message