Is there a way in the java driver, to get the number of rows that an update was applied to? 

On Wed, May 13, 2015 at 4:33 PM, Ali Akhtar <ali.rac200@gmail.com> wrote:
Thanks. So supplying the timestamp with the update (via using) should fix that, right? (By skipping updates where lastModified < dbLastModified).

I'm currently doing TimeUnit.MILLISECONDS.toMicros(  myDate.getTime() ) and that has worked for inserts, however how do I verify that future updates are ignored and aren't run again?

On Wed, May 13, 2015 at 4:29 PM, Ken Hancock <ken.hancock@schange.com> wrote:
While updates don't create tombstones, overwrites create a similar performance penalty at the read phase.  That key will need to be fetched from every SSTable where it resides so the "most recent" column can be returned.




On Wed, May 13, 2015 at 6:38 AM, Peer, Oded <Oded.Peer@rsa.com> wrote:

You can use the “last modified” value as the TIMESTAMP for your UPDATE operation.

This way the values will only be updated if lastModified date > the lastModified you have in the DB.

 

Updates to values don’t create tombstones. Only deletes (either by executing delete, inserting a null value or by setting a TTL) create tombstones.

 

 

From: Ali Akhtar [mailto:ali.rac200@gmail.com]
Sent: Wednesday, May 13, 2015 1:27 PM
To: user@cassandra.apache.org
Subject: Updating only modified records (where lastModified < current date)

 

I'm running some ETL jobs, where the pattern is the following:

 

1- Get some records from an external API,

 

2- For each record, see if its lastModified date > the lastModified i have in db (or if I don't have that record in db)

 

3- If lastModified < dbLastModified, the item wasn't changed, ignore it. Otherwise, run an update query and update that record.

 

(It is rare for existing records to get updated, so I'm not that concerned about tombstones).

 

The problem however is, since I have to query each record's lastModified, one at a time, that's adding a major bottleneck to my job.

 

E.g if I have 6k records, I have to run a total of 6k 'select lastModified from myTable where id = ?' queries.

 

Is there a better way, am I doing anything wrong, etc? Any suggestions would be appreciated.

 

Thanks.