cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ali Akhtar <ali.rac...@gmail.com>
Subject Re: Updating only modified records (where lastModified < current date)
Date Wed, 13 May 2015 11:09:58 GMT
If specifying 'using' timestamp, the docs say to provide microseconds, but
where are these microseconds obtained from? I have regular java.util.Date
objects, I can get the time in milliseconds (i.e the unix timestamp), how
would I convert that to microseconds?

On Wed, May 13, 2015 at 3:56 PM, Ali Akhtar <ali.rac200@gmail.com> wrote:

> Thanks Peter, that's interesting. I didn't know of that option.
>
> If updates don't create tombstones (and i'm already taking pains to ensure
> no nulls are present in queries), then is there no cost to just submitting
> an update for everything regardless of whether lastModified has changed?
>
> Thanks.
>
> On Wed, May 13, 2015 at 3:38 PM, Peer, Oded <Oded.Peer@rsa.com> wrote:
>
>>  You can use the “last modified” value as the TIMESTAMP for your UPDATE
>> operation.
>>
>> This way the values will only be updated if lastModified date > the
>> lastModified you have in the DB.
>>
>>
>>
>> Updates to values don’t create tombstones. Only deletes (either by
>> executing delete, inserting a null value or by setting a TTL) create
>> tombstones.
>>
>>
>>
>>
>>
>> *From:* Ali Akhtar [mailto:ali.rac200@gmail.com]
>> *Sent:* Wednesday, May 13, 2015 1:27 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Updating only modified records (where lastModified < current
>> date)
>>
>>
>>
>> I'm running some ETL jobs, where the pattern is the following:
>>
>>
>>
>> 1- Get some records from an external API,
>>
>>
>>
>> 2- For each record, see if its lastModified date > the lastModified i
>> have in db (or if I don't have that record in db)
>>
>>
>>
>> 3- If lastModified < dbLastModified, the item wasn't changed, ignore it.
>> Otherwise, run an update query and update that record.
>>
>>
>>
>> (It is rare for existing records to get updated, so I'm not that
>> concerned about tombstones).
>>
>>
>>
>> The problem however is, since I have to query each record's lastModified,
>> one at a time, that's adding a major bottleneck to my job.
>>
>>
>>
>> E.g if I have 6k records, I have to run a total of 6k 'select
>> lastModified from myTable where id = ?' queries.
>>
>>
>>
>> Is there a better way, am I doing anything wrong, etc? Any suggestions
>> would be appreciated.
>>
>>
>>
>> Thanks.
>>
>
>

Mime
View raw message