cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ali Akhtar <ali.rac...@gmail.com>
Subject Re: Updating only modified records (where lastModified < current date)
Date Wed, 13 May 2015 11:36:37 GMT
Is there a way in the java driver, to get the number of rows that an update
was applied to?

On Wed, May 13, 2015 at 4:33 PM, Ali Akhtar <ali.rac200@gmail.com> wrote:

> Thanks. So supplying the timestamp with the update (via using) should fix
> that, right? (By skipping updates where lastModified < dbLastModified).
>
> I'm currently doing TimeUnit.MILLISECONDS.toMicros(  myDate.getTime() )
> and that has worked for inserts, however how do I verify that future
> updates are ignored and aren't run again?
>
> On Wed, May 13, 2015 at 4:29 PM, Ken Hancock <ken.hancock@schange.com>
> wrote:
>
>> While updates don't create tombstones, overwrites create a similar
>> performance penalty at the read phase.  That key will need to be fetched
>> from every SSTable where it resides so the "most recent" column can be
>> returned.
>>
>>
>>
>>
>> On Wed, May 13, 2015 at 6:38 AM, Peer, Oded <Oded.Peer@rsa.com> wrote:
>>
>>>  You can use the “last modified” value as the TIMESTAMP for your UPDATE
>>> operation.
>>>
>>> This way the values will only be updated if lastModified date > the
>>> lastModified you have in the DB.
>>>
>>>
>>>
>>> Updates to values don’t create tombstones. Only deletes (either by
>>> executing delete, inserting a null value or by setting a TTL) create
>>> tombstones.
>>>
>>>
>>>
>>>
>>>
>>> *From:* Ali Akhtar [mailto:ali.rac200@gmail.com]
>>> *Sent:* Wednesday, May 13, 2015 1:27 PM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Updating only modified records (where lastModified < current
>>> date)
>>>
>>>
>>>
>>> I'm running some ETL jobs, where the pattern is the following:
>>>
>>>
>>>
>>> 1- Get some records from an external API,
>>>
>>>
>>>
>>> 2- For each record, see if its lastModified date > the lastModified i
>>> have in db (or if I don't have that record in db)
>>>
>>>
>>>
>>> 3- If lastModified < dbLastModified, the item wasn't changed, ignore it.
>>> Otherwise, run an update query and update that record.
>>>
>>>
>>>
>>> (It is rare for existing records to get updated, so I'm not that
>>> concerned about tombstones).
>>>
>>>
>>>
>>> The problem however is, since I have to query each record's
>>> lastModified, one at a time, that's adding a major bottleneck to my job.
>>>
>>>
>>>
>>> E.g if I have 6k records, I have to run a total of 6k 'select
>>> lastModified from myTable where id = ?' queries.
>>>
>>>
>>>
>>> Is there a better way, am I doing anything wrong, etc? Any suggestions
>>> would be appreciated.
>>>
>>>
>>>
>>> Thanks.
>>>
>>
>>
>>
>>
>>
>>
>

Mime
View raw message