cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8180) Optimize disk seek using min/max column name meta data when the LIMIT clause is used
Date Wed, 22 Jul 2015 08:45:05 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636514#comment-14636514
] 

Sylvain Lebresne commented on CASSANDRA-8180:
---------------------------------------------

bq. Sylvain Lebresne are you happy to still be the reviewer or do you want to suggest someone
else?

I'm leaving on vacation at the end of the week and have enough on my plate until then that
it's safe to assume I won't have time to look at this one. I'm happy to have a look when I'm
back and have a tad more time, but it's probably a good idea to find another reviewer in the
meantime in case that new reviewer has time to get to it sooner. I would suggest Branimir
if he has some cycles since that ticket has a lot to do with {{MergeIterator}} and he has
been dealing with that quite a bit lately.

> Optimize disk seek using min/max column name meta data when the LIMIT clause is used
> ------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8180
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8180
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Cassandra 2.0.10
>            Reporter: DOAN DuyHai
>            Assignee: Stefania
>            Priority: Minor
>             Fix For: 3.x
>
>
> I was working on an example of sensor data table (timeseries) and face a use case where
C* does not optimize read on disk.
> {code}
> cqlsh:test> CREATE TABLE test(id int, col int, val text, PRIMARY KEY(id,col)) WITH
CLUSTERING ORDER BY (col DESC);
> cqlsh:test> INSERT INTO test(id, col , val ) VALUES ( 1, 10, '10');
> ...
> >nodetool flush test test
> ...
> cqlsh:test> INSERT INTO test(id, col , val ) VALUES ( 1, 20, '20');
> ...
> >nodetool flush test test
> ...
> cqlsh:test> INSERT INTO test(id, col , val ) VALUES ( 1, 30, '30');
> ...
> >nodetool flush test test
> {code}
> After that, I activate request tracing:
> {code}
> cqlsh:test> SELECT * FROM test WHERE id=1 LIMIT 1;
>  activity                                                                  | timestamp
   | source    | source_elapsed
> ---------------------------------------------------------------------------+--------------+-----------+----------------
>                                                         execute_cql3_query | 23:48:46,498
| 127.0.0.1 |              0
>                             Parsing SELECT * FROM test WHERE id=1 LIMIT 1; | 23:48:46,498
| 127.0.0.1 |             74
>                                                        Preparing statement | 23:48:46,499
| 127.0.0.1 |            253
>                                   Executing single-partition query on test | 23:48:46,499
| 127.0.0.1 |            930
>                                               Acquiring sstable references | 23:48:46,499
| 127.0.0.1 |            943
>                                                Merging memtable tombstones | 23:48:46,499
| 127.0.0.1 |           1032
>                                                Key cache hit for sstable 3 | 23:48:46,500
| 127.0.0.1 |           1160
>                                Seeking to partition beginning in data file | 23:48:46,500
| 127.0.0.1 |           1173
>                                                Key cache hit for sstable 2 | 23:48:46,500
| 127.0.0.1 |           1889
>                                Seeking to partition beginning in data file | 23:48:46,500
| 127.0.0.1 |           1901
>                                                Key cache hit for sstable 1 | 23:48:46,501
| 127.0.0.1 |           2373
>                                Seeking to partition beginning in data file | 23:48:46,501
| 127.0.0.1 |           2384
>  Skipped 0/3 non-slice-intersecting sstables, included 0 due to tombstones | 23:48:46,501
| 127.0.0.1 |           2768
>                                 Merging data from memtables and 3 sstables | 23:48:46,501
| 127.0.0.1 |           2784
>                                         Read 2 live and 0 tombstoned cells | 23:48:46,501
| 127.0.0.1 |           2976
>                                                           Request complete | 23:48:46,501
| 127.0.0.1 |           3551
> {code}
> We can clearly see that C* hits 3 SSTables on disk instead of just one, although it has
the min/max column meta data to decide which SSTable contains the most recent data.
> Funny enough, if we add a clause on the clustering column to the select, this time C*
optimizes the read path:
> {code}
> cqlsh:test> SELECT * FROM test WHERE id=1 AND col > 25 LIMIT 1;
>  activity                                                                  | timestamp
   | source    | source_elapsed
> ---------------------------------------------------------------------------+--------------+-----------+----------------
>                                                         execute_cql3_query | 23:52:31,888
| 127.0.0.1 |              0
>                Parsing SELECT * FROM test WHERE id=1 AND col > 25 LIMIT 1; | 23:52:31,888
| 127.0.0.1 |             60
>                                                        Preparing statement | 23:52:31,888
| 127.0.0.1 |            277
>                                   Executing single-partition query on test | 23:52:31,889
| 127.0.0.1 |            961
>                                               Acquiring sstable references | 23:52:31,889
| 127.0.0.1 |            971
>                                                Merging memtable tombstones | 23:52:31,889
| 127.0.0.1 |           1020
>                                                Key cache hit for sstable 3 | 23:52:31,889
| 127.0.0.1 |           1108
>                                Seeking to partition beginning in data file | 23:52:31,889
| 127.0.0.1 |           1117
>  Skipped 2/3 non-slice-intersecting sstables, included 0 due to tombstones | 23:52:31,889
| 127.0.0.1 |           1611
>                                 Merging data from memtables and 1 sstables | 23:52:31,890
| 127.0.0.1 |           1624
>                                         Read 1 live and 0 tombstoned cells | 23:52:31,890
| 127.0.0.1 |           1700
>                                                           Request complete | 23:52:31,890
| 127.0.0.1 |           2140
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message