cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Stevens <>
Subject Re: Data lost in Cassandra 3.5 single instance via Erlang driver
Date Wed, 15 Jun 2016 13:14:28 GMT
As a side note, if you're inserting records quickly enough that you're
potentially doing multiple in the same millisecond, it seems likely to me
that your partition size is going to be too large at a day level unless
your writes are super bursty: ((appkey, pub_date), pub_timestamp).  You
might need to do hour, or 15 minutes or something, depending on what you
think your peak write rate will look like.

And another note, slightly bikeshed, but *personally* when doing time-based
bucketing (pub_date column), I prefer to use a timestamp and floor the
value I write.  This makes it easier to convert to a smaller bucket size
without changing the format of the data in that column.

On Wed, Jun 15, 2016 at 1:07 AM linbo liao <> wrote:

> Thanks Ben, Paul, Alain.  I debug at client side find the reason is
> pub_timestamp duplicated.  I will use timeuuid instead.
> Thanks,
> Linbo
> 2016-06-15 13:09 GMT+08:00 Alain Rastoul <>:
>> On 15/06/2016 06:40, linbo liao wrote:
>>> I am not sure, but looks it will cause the update other than insert. If
>>> it is true, the only way is request includes IF NOT EXISTS, inform the
>>> client it failed?
>>> Thanks,
>>> Linbo
>>> Hi Linbo,
>> +1 with what Ben said, timestamp has a millisecond precision and is a bad
>> choice for making PK unicity.
>> If your client and server are on the same physical machine (both on same
>> computer or different vms on same hypervisor), insert duration can go down
>> to very few microseconds (2~3 on a recent computer).
>> Your insert will/should often become "update".
>> The reason is that update does not exists in cassandra, neither delete,
>> they are just "appends":  append with same key for update or append of a
>> tombstone for delete.
>> You should try to use a timeuuid instead, it has a node, clock sequence,
>> a counter plus the timestamp part that you can get with cql functions, and
>> it exists for that use.
>> see here for the functions
>> --
>> best,
>> Alain

View raw message