cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject RE: cassandra read performance jumps from one row to next
Date Thu, 23 Jan 2014 00:05:53 GMT

Trying to find out why a cassandra read is taking so long, I used tracing and limited the
number of rows. Strangely, when I query 600 rows, I get results in ~50 milliseconds. But 610
rows takes nearly 1 second!
cqlsh> select containerdefinitionid from containerdefinition limit 600;
... lots of output ...

Tracing session: 6b506cd0-83bc-11e3-96e8-e182571757d7

      | timestamp    | source        | source_elapsed
| 15:25:02,878 | |              0
                                                                               Parsing statement
| 15:25:02,878 | |             39
                                                                              Peparing statement
| 15:25:02,878 | |            101
                                                                   Determining replicas to
query | 15:25:02,878 | |            152
Executing seq scan across 1 sstables for [min(-9223372036854775808), min(-9223372036854775808)]
| 15:25:02,879 | |           1021
                                                                Scanned 755 rows and matched
755 | 15:25:02,933 | |          55169
                                                                                Request complete
| 15:25:02,934 | |          56300
cqlsh> select containerdefinitionid from containerdefinition limit 610;
... just about the same output and trace info, except...

                                                                Scanned 766 rows and matched
766 | 15:25:58,908 | |         739141
There seems to be nothing unusual about the data in those particular rows: - values are similar
to those before and after. - using the COPY command I can export the whole table and import
on a different cluster and performance is fine. - these rows are the first example, but there
seem to be other places where query time jumps as well. Whole table is only ~3000 rows but
takes ~15sec to list all primary keys.
There does seem to be something unusual about the data STORAGE: - snapshot copied to another
cluster and imported gives same results with same limits - COPY data to CSV and then into
another cluster does not, performance is great
Have tried compaction, repair, reindex, cleanup and refresh. No effect.
I realize I could "fix" by copying data out and in, but I'm trying to figure out what is going
on here to avoid it happening in production on a table too big to fix with COPY.
Table has 17 columns, 3 indices, TEXT primary key, two LIST columns and two TIMESTAMP columns;
the rest are TEXT. Can reproduce issue with both SimpleStrategy and DC-aware replication.
Can reproduce with 4 copies of data on 4 servers, 2 copies on 2 servers and 1 copy on 2 servers
(so doesn't matter if query is performed locally or involves multiple servers). Cassandra-1.2
with cqlsh.
Any ideas? Suggestions?

View raw message