cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Turner <>
Subject Data Streamed successfully but not queryable
Date Thu, 22 Oct 2015 17:23:30 GMT
Apologies for the long post but it's a complicated story...

I am running Cassandra in 3 datacenters 1,2 & 3. DCs 1 and 2 are working fine but I am
having trouble selecting any data at all in DC 3.

I've stripped DC3 down from 4 nodes to a single brand new node to make debugging logs/traces
etc easier. It's an AWS t2.medium, Cassandra 2.1.10 running on Amazon Linux AMI 2015.09. Java

I am using the CQLSSTableWriter/BulkLoader utility to stream up a very small amount of test
data to an empty table. The Bulkloader output reports no errors and the System log looks fine
- session completed, all bytes received. I can see the data.db files in the appropriate data
directory flushed to disk, however I can't select any of the data via CQL. No exceptions thrown,
no rows returned. It's like it's simply ignoring the data which I know exists.

My first thought was the timestamps on the data might somehow be set in the future but SSTable2json
shows the relevant data exactly as I'd expect it with timestamps that match the upload times
and should be readable. It's a fresh install with NTP running so I am ruling out any timedrift
across the nodes.

It's worth noting that the CQLSSTableWriter/BulkLoader utility works fine streaming in data
centers 1 & 2. Data streamed from DCs 1 & 2 into DC 3 is immediately selectable. Data
streamed from DC3 into DCs 1&2 is also immediately selectable. Only data streamed from
DC3 into the DC3 cluster is ignored by select statements.

As an example I've streamed the same test row into DC3 from DC1 and DC3. There are now be
2 rows/partitions in the table in DC3, here they are in json via SSTable2json -

Doesn't Show (Streamed from same datacenter):

{"key": "e3589ff7-f753-4ff3-9809-365322306825",
"cells": [["bdff0420-78d2-11e5-a0cc-0607dbae485d:","",1445528225152000],

Shows Correctly (Streamed over VPN from different datacenter):

{"key": "1448b4de-7ebe-4d46-a2a4-365443970109",
"cells": [["ccaba8f0-78d4-11e5-ae6a-12a3a8f7b9be:","",1445529108746000],

Here is the trace on the select statement (no where clause - should return both rows), it
looks to me like it's finding both rows. However only one row (from DC1) is being returned
by the query.

Tracing session: eaafc370-78d6-11e5-9990-6f2f8a300360

                                   | timestamp                  | source       | source_elapsed
                 Execute CQL3 query | 2015-10-22 16:06:56.039000 | |        
                                                                          Parsing select *
from sheet1; [SharedPool-Worker-1] | 2015-10-22 16:06:56.044000 | |         
statement [SharedPool-Worker-1] | 2015-10-22 16:06:56.044000 | |            
                                                                              Computing ranges
to query [SharedPool-Worker-1] | 2015-10-22 16:06:56.045000 | |            320
Submitting range requests on 257 ranges with a concurrency of 234 (0.42772275 rows per range
expected) [SharedPool-Worker-1] | 2015-10-22 16:06:56.045000 | |            726
        Executing seq scan across 3 sstables for [min(-9223372036854775808), min(-9223372036854775808)]
[SharedPool-Worker-2] | 2015-10-22 16:06:56.045000 | |           1423
                                                                      Read 0 live and 0 tombstone
cells [SharedPool-Worker-2] | 2015-10-22 16:06:56.045000 | |           1779
                                                                      Read 1 live and 0 tombstone
cells [SharedPool-Worker-2] | 2015-10-22 16:06:56.045000 | |           1850
                                                                           Scanned 2 rows
and matched 2 [SharedPool-Worker-2] | 2015-10-22 16:06:56.045000 | |        
                                              Submitted 1 concurrent range requests covering
257 ranges [SharedPool-Worker-1] | 2015-10-22 16:06:56.046000 | |           5390
                   Request complete | 2015-10-22 16:06:56.045807 | |        

Can anyone suggest why my data isn't being returned or where to continue digging?

Thank you!

View raw message