cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ali Akhtar <ali.rac...@gmail.com>
Subject Updating only modified records (where lastModified < current date)
Date Wed, 13 May 2015 10:26:42 GMT
I'm running some ETL jobs, where the pattern is the following:

1- Get some records from an external API,

2- For each record, see if its lastModified date > the lastModified i have
in db (or if I don't have that record in db)

3- If lastModified < dbLastModified, the item wasn't changed, ignore it.
Otherwise, run an update query and update that record.

(It is rare for existing records to get updated, so I'm not that concerned
about tombstones).

The problem however is, since I have to query each record's lastModified,
one at a time, that's adding a major bottleneck to my job.

E.g if I have 6k records, I have to run a total of 6k 'select lastModified
from myTable where id = ?' queries.

Is there a better way, am I doing anything wrong, etc? Any suggestions
would be appreciated.

Thanks.

Mime
View raw message