Hi Nick,

Thanks for the note! We have our cassanra instances installed on virtual hosts in VMWare and the clock synchronization is handled by the latter, so I can't use ntpdate (says that NTP socket is in use). Is there any way to check if the clocks are really synchronized? My best attempt was using three shell windows with commands already typed thus requiring only clicking on the window and hitting enter. The results varied by 100-200 msec which I guess is just about the time I need to click and press enter :)

Thanks in advance,
Alexander


2013/10/7 Nikolay Mihaylov <nmmm@nmmm.nu>
Hi

my two cents - before doing anything else, make sure clocks are synchronized to the millisecond.
ntp will do so.

Nick.


On Mon, Oct 7, 2013 at 9:02 AM, Alexander Shutyaev <shutyaev@gmail.com> wrote:
Hi all,

We have encountered the following problem with cassandra.

* We use cassandra v2.0.0 from Datastax community repo.

* We have 3 nodes in a cluster, all of them are seed providers.

* We have a single keyspace with replication factor = 3:

CREATE KEYSPACE bof WITH replication = {
  'class': 'SimpleStrategy',
  'replication_factor': '3'
};

* We use Datastax Java CQL Driver v1.0.3 in our application.

* We have not modified any consistency settings in our app, so I assume we have the default QUORUM (2 out of 3 in our case) consistency for reads and writes.

* We have 400+ tables which can be divided in two groups (main and uids). All tables in a group have the same definition, they vary only by name. The sample definitions are:

CREATE TABLE bookingfile (
  key text,
  entity_created timestamp,
  entity_createdby text,
  entity_entitytype text,
  entity_modified timestamp,
  entity_modifiedby text,
  entity_status text,
  entity_uid text,
  entity_updatepolicy text,
  version_created timestamp,
  version_createdby text,
  version_data blob,
  version_dataformat text,
  version_datasource text,
  version_modified timestamp,
  version_modifiedby text,
  version_uid text,
  version_versionnotes text,
  version_versionnumber int,
  versionscount int,
  PRIMARY KEY (key)
) WITH
  bloom_filter_fp_chance=0.010000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.100000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='NONE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

CREATE TABLE bookingfile_uids (
  date text,
  timeanduid text,
  deleted boolean,
  PRIMARY KEY (date, timeanduid)
) WITH
  bloom_filter_fp_chance=0.010000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.100000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='NONE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

CREATE INDEX BookingFile_uids_deleted ON bookingfile_uids (deleted);

* We don't have any problems with the tables from the main group.

* As for the tables from the uids group we have noticed that sometimes deletes from these tables do not do their job. They don't fail, they just do nothing. We have confirmed this by adding a select query after deletes. Most times everything is ok and select returns 0 records. But sometimes (~5 out of 100,000) it returns the supposedly deleted row.

* We have logged the ExecutionInfo objects with query tracing that are returned by Datastax's driver. Here are the details

DELETE FROM bookingfile_uids WHERE date=C20131006 AND timeAndUid=195248590_4762ce41-d2d2-448d-be8c-c7fcb6b7394e

ExecutionInfo: [
triedHosts=/10.10.30.23;
queriedHost=/10.10.30.23;
achievedConsistencyLevel=null;
queryTrace=
Message received from /10.10.30.23 on /10.10.30.19[Thread-56] at Sun Oct 06 19:55:57 MSK 2013
Acquiring switchLock read lock on /10.10.30.19[MutationStage:49] at Sun Oct 06 19:55:57 MSK 2013
Appending to commitlog on /10.10.30.19[MutationStage:49] at Sun Oct 06 19:55:57 MSK 2013
Adding to bookingfile_uids memtable on /10.10.30.19[MutationStage:49] at Sun Oct 06 19:55:57 MSK 2013
Enqueuing response to /10.10.30.23 on /10.10.30.19[MutationStage:49] at Sun Oct 06 19:55:57 MSK 2013
Sending message to /10.10.30.23 on /10.10.30.19[WRITE-/10.10.30.23] at Sun Oct 06 19:55:57 MSK 2013
Message received from /10.10.30.23 on /10.10.30.20[Thread-34] at Sun Oct 06 19:55:57 MSK 2013
Acquiring switchLock read lock on /10.10.30.20[MutationStage:43] at Sun Oct 06 19:55:57 MSK 2013
Appending to commitlog on /10.10.30.20[MutationStage:43] at Sun Oct 06 19:55:57 MSK 2013
Adding to bookingfile_uids memtable on /10.10.30.20[MutationStage:43] at Sun Oct 06 19:55:57 MSK 2013
Enqueuing response to /10.10.30.23 on /10.10.30.20[MutationStage:43] at Sun Oct 06 19:55:57 MSK 2013
Sending message to /10.10.30.23 on /10.10.30.20[WRITE-/10.10.30.23] at Sun Oct 06 19:55:57 MSK 2013
Determining replicas for mutation on /10.10.30.23[Native-Transport-Requests:1387368] at Sun Oct 06 19:55:57 MSK 2013
Sending message to /10.10.30.19 on /10.10.30.23[WRITE-/10.10.30.19] at Sun Oct 06 19:55:57 MSK 2013
Acquiring switchLock read lock on /10.10.30.23[MutationStage:46] at Sun Oct 06 19:55:57 MSK 2013
Sending message to /10.10.30.20 on /10.10.30.23[WRITE-/10.10.30.20] at Sun Oct 06 19:55:57 MSK 2013
Message received from /10.10.30.20 on /10.10.30.23[Thread-5] at Sun Oct 06 19:55:57 MSK 2013
Processing response from /10.10.30.20 on /10.10.30.23[RequestResponseStage:4] at Sun Oct 06 19:55:57 MSK 2013
Message received from /10.10.30.19 on /10.10.30.23[Thread-7] at Sun Oct 06 19:55:57 MSK 2013
Processing response from /10.10.30.19 on /10.10.30.23[RequestResponseStage:4] at Sun Oct 06 19:55:57 MSK 2013;
]

SELECT * FROM bookingfile_uids WHERE date=C20131006 AND timeAndUid=195248590_4762ce41-d2d2-448d-be8c-c7fcb6b7394e returned 1 record

the same query 1 second later:

DELETE FROM bookingfile_uids WHERE date=C20131006 AND timeAndUid=195248590_4762ce41-d2d2-448d-be8c-c7fcb6b7394e

ExecutionInfo: [
triedHosts=/10.10.30.20;
queriedHost=/10.10.30.20;
achievedConsistencyLevel=null;
queryTrace=
Message received from /10.10.30.20 on /10.10.30.19[Thread-57] at Sun Oct 06 19:55:58 MSK 2013
Determining replicas for mutation on /10.10.30.20[Native-Transport-Requests:1387705] at Sun Oct 06 19:55:58 MSK 2013
Acquiring switchLock read lock on /10.10.30.20[MutationStage:43] at Sun Oct 06 19:55:58 MSK 2013
Appending to commitlog on /10.10.30.20[MutationStage:43] at Sun Oct 06 19:55:58 MSK 2013
Adding to bookingfile_uids memtable on /10.10.30.20[MutationStage:43] at Sun Oct 06 19:55:58 MSK 2013
Sending message to /10.10.30.19 on /10.10.30.20[WRITE-/10.10.30.19] at Sun Oct 06 19:55:58 MSK 2013
Sending message to /10.10.30.23 on /10.10.30.20[WRITE-/10.10.30.23] at Sun Oct 06 19:55:58 MSK 2013
Message received from /10.10.30.19 on /10.10.30.20[Thread-4] at Sun Oct 06 19:55:58 MSK 2013
Processing response from /10.10.30.19 on /10.10.30.20[RequestResponseStage:6] at Sun Oct 06 19:55:58 MSK 2013
Message received from /10.10.30.20 on /10.10.30.23[Thread-18] at Sun Oct 06 19:55:58 MSK 2013
]

SELECT * FROM bookingfile_uids WHERE date=C20131006 AND timeAndUid=195248590_4762ce41-d2d2-448d-be8c-c7fcb6b7394e returned 0 records.

* Cassandra's system.log on all 3 nodes lists nothing special during these queries - just some compaction related INFO entries.

Can anyone help with this? What is our next step?

Thanks in advance,
Alexander