cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Viktor Jevdokimov <Viktor.Jevdoki...@adform.com>
Subject RE: Upserting the same values multiple times
Date Wed, 22 Jan 2014 06:50:17 GMT
It's not about tombstones. Tombstones are virtually markers for deleted columns (using delete
or ttl) in new sstables after compaction to keep such columns for gcgrace period.

Updates do not create tombstones for previous records, latest version upon timestamp will
be saved from memtable or when merged from sstables upon compaction.

While data is in the memtable, latest timestamp wins, only latest version will flush to disk.
Then everything depends on how fast you flush memtables and how compaction works thereafter.
Do not expect any tombstones with updates, except when delete columns.


Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: Viktor.Jevdokimov@adform.com<mailto:Viktor.Jevdokimov@adform.com>
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-03163 Vilnius, Lithuania
Follow us on Twitter: @adforminsider<http://twitter.com/#!/adforminsider>
Experience Adform DNA<http://vimeo.com/76421547>

[Adform News] <http://www.adform.com>
[Adform awarded the Best Employer 2012] <http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/>


Disclaimer: The information contained in this message and attachments is intended solely for
the attention and use of the named addressee and may be confidential. If you are not the intended
recipient, you are reminded that the information remains the property of the sender. You must
not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this
message in error, please contact the sender immediately and irrevocably delete this message
and any copies.

From: Sanjeeth Kumar [mailto:sanjeeth@exotel.in]
Sent: Wednesday, January 22, 2014 5:37 AM
To: user@cassandra.apache.org
Subject: Upserting the same values multiple times

Hi,
   I have a table A, one of the fields of which is a text column called body.
 This text's length could vary somewhere between 120 characters to say 400 characters. The
contents of this column can be the same for millions of rows.
To prevent the repetition of the same data, I thought I will add another table B, which stores
<MD5Hash(body), body>\.
Table A {
    some fields;
    ....
    digest text,
    .....
}


TABLE B (
  digest text,
  body text,
  PRIMARY KEY (digest)
)
Whenever I insert into table A, I calculate the digest of body, and blindly call a insert
into table B also. I'm not doing any read on B. This could result in the same <digest,
body> being inserted millions of times in a short span of time.
Couple of questions.
1) Would this cause an issue due to the number of tombstones created in a short span of time
.I'm assuming for every insert , there would be a tombstone created for the previous record.
2) Or should I just replicate the same data in Table A itself multiple times (with compression,
space aint that big an issue ?)

- Sanjeeth

Mime
View raw message