cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Wille <rwi...@fold3.com>
Subject Re: Upserting the same values multiple times
Date Wed, 22 Jan 2014 06:59:32 GMT
No tombstones, just many copies of the same data until compaction occurs.

From:  Sanjeeth Kumar <sanjeeth@exotel.in>
Reply-To:  <user@cassandra.apache.org>
Date:  Tuesday, January 21, 2014 at 8:37 PM
To:  <user@cassandra.apache.org>
Subject:  Upserting the same values multiple times

Hi,
   I have a table A, one of the fields of which is a text column called
body.
 This text's length could vary somewhere between 120 characters to say 400
characters. The contents of this column can be the same for millions of
rows.

To prevent the repetition of the same data, I thought I will add another
table B, which stores <MD5Hash(body), body>\.

Table A {
    some fields;
    ....
    digest text,
    .....
}
  

TABLE B (
  digest text,
  body text,
  PRIMARY KEY (digest)
)

Whenever I insert into table A, I calculate the digest of body, and blindly
call a insert into table B also. I'm not doing any read on B. This could
result in the same <digest, body> being inserted millions of times in a
short span of time.

Couple of questions.

1) Would this cause an issue due to the number of tombstones created in a
short span of time .I'm assuming for every insert , there would be a
tombstone created for the previous record.
2) Or should I just replicate the same data in Table A itself multiple times
(with compression, space aint that big an issue ?)


- Sanjeeth



Mime
View raw message