cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Branimir Lambov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8180) Optimize disk seek using min/max column name meta data when the LIMIT clause is used
Date Thu, 21 Jan 2016 09:10:40 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110307#comment-15110307
] 

Branimir Lambov commented on CASSANDRA-8180:
--------------------------------------------

bq. What is an example of any other incomplete prefix and do we have a gap in the tests then?

Tombstones. A {{DELETE WHERE pk = ? AND ck1 = ?}} in a table with key {{(pk, ck1, ck2)}} will
generate one.

bq. What I don't understand is how things like shouldInclude() in ClusteringIndexNamesFilter
or ClusteringIndexSliceFilter work.

If you look at the callsites for the method, you will see that they do more work in the presence
of tombstones. So one solution is not to use the {{min/maxClusteringValues}} in that case.

bq. \[MetadataSerializer.deserialize()\] should receive the total size to work out if there
is more stuff to read at the end.

No need for that, you can set a flag in {{Version}} to tell you whether or not the information
is present.

bq. I'm not sure what you mean, \[use a RangeTombstoneBound\] for the test or the fix?

This is the fix. Instead of an empty row, the lower bound should be a {{RangeTombstoneBound}}
as described.

bq. The global lower bound is free, since it is available in the metadata. The index lower
bound is more accurate but it requires seeking the index file.

In the way you use this class, by the time {{lowerBound()}} is called, all of this is already
done (by {{UnfilteredRowMergeIterator.create}}), possibly unnecessarily (if {{MergeIterator.OneToOne}}
is to be used). I would just move finding the bound to {{lowerBound()}}, and I don't think
it's even necessary to save the bound-- just retrieve it there, the method won't be called
more than once.


> Optimize disk seek using min/max column name meta data when the LIMIT clause is used
> ------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8180
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8180
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths
>         Environment: Cassandra 2.0.10
>            Reporter: DOAN DuyHai
>            Assignee: Stefania
>            Priority: Minor
>             Fix For: 3.x
>
>         Attachments: 8180_001.yaml, 8180_002.yaml
>
>
> I was working on an example of sensor data table (timeseries) and face a use case where
C* does not optimize read on disk.
> {code}
> cqlsh:test> CREATE TABLE test(id int, col int, val text, PRIMARY KEY(id,col)) WITH
CLUSTERING ORDER BY (col DESC);
> cqlsh:test> INSERT INTO test(id, col , val ) VALUES ( 1, 10, '10');
> ...
> >nodetool flush test test
> ...
> cqlsh:test> INSERT INTO test(id, col , val ) VALUES ( 1, 20, '20');
> ...
> >nodetool flush test test
> ...
> cqlsh:test> INSERT INTO test(id, col , val ) VALUES ( 1, 30, '30');
> ...
> >nodetool flush test test
> {code}
> After that, I activate request tracing:
> {code}
> cqlsh:test> SELECT * FROM test WHERE id=1 LIMIT 1;
>  activity                                                                  | timestamp
   | source    | source_elapsed
> ---------------------------------------------------------------------------+--------------+-----------+----------------
>                                                         execute_cql3_query | 23:48:46,498
| 127.0.0.1 |              0
>                             Parsing SELECT * FROM test WHERE id=1 LIMIT 1; | 23:48:46,498
| 127.0.0.1 |             74
>                                                        Preparing statement | 23:48:46,499
| 127.0.0.1 |            253
>                                   Executing single-partition query on test | 23:48:46,499
| 127.0.0.1 |            930
>                                               Acquiring sstable references | 23:48:46,499
| 127.0.0.1 |            943
>                                                Merging memtable tombstones | 23:48:46,499
| 127.0.0.1 |           1032
>                                                Key cache hit for sstable 3 | 23:48:46,500
| 127.0.0.1 |           1160
>                                Seeking to partition beginning in data file | 23:48:46,500
| 127.0.0.1 |           1173
>                                                Key cache hit for sstable 2 | 23:48:46,500
| 127.0.0.1 |           1889
>                                Seeking to partition beginning in data file | 23:48:46,500
| 127.0.0.1 |           1901
>                                                Key cache hit for sstable 1 | 23:48:46,501
| 127.0.0.1 |           2373
>                                Seeking to partition beginning in data file | 23:48:46,501
| 127.0.0.1 |           2384
>  Skipped 0/3 non-slice-intersecting sstables, included 0 due to tombstones | 23:48:46,501
| 127.0.0.1 |           2768
>                                 Merging data from memtables and 3 sstables | 23:48:46,501
| 127.0.0.1 |           2784
>                                         Read 2 live and 0 tombstoned cells | 23:48:46,501
| 127.0.0.1 |           2976
>                                                           Request complete | 23:48:46,501
| 127.0.0.1 |           3551
> {code}
> We can clearly see that C* hits 3 SSTables on disk instead of just one, although it has
the min/max column meta data to decide which SSTable contains the most recent data.
> Funny enough, if we add a clause on the clustering column to the select, this time C*
optimizes the read path:
> {code}
> cqlsh:test> SELECT * FROM test WHERE id=1 AND col > 25 LIMIT 1;
>  activity                                                                  | timestamp
   | source    | source_elapsed
> ---------------------------------------------------------------------------+--------------+-----------+----------------
>                                                         execute_cql3_query | 23:52:31,888
| 127.0.0.1 |              0
>                Parsing SELECT * FROM test WHERE id=1 AND col > 25 LIMIT 1; | 23:52:31,888
| 127.0.0.1 |             60
>                                                        Preparing statement | 23:52:31,888
| 127.0.0.1 |            277
>                                   Executing single-partition query on test | 23:52:31,889
| 127.0.0.1 |            961
>                                               Acquiring sstable references | 23:52:31,889
| 127.0.0.1 |            971
>                                                Merging memtable tombstones | 23:52:31,889
| 127.0.0.1 |           1020
>                                                Key cache hit for sstable 3 | 23:52:31,889
| 127.0.0.1 |           1108
>                                Seeking to partition beginning in data file | 23:52:31,889
| 127.0.0.1 |           1117
>  Skipped 2/3 non-slice-intersecting sstables, included 0 due to tombstones | 23:52:31,889
| 127.0.0.1 |           1611
>                                 Merging data from memtables and 1 sstables | 23:52:31,890
| 127.0.0.1 |           1624
>                                         Read 1 live and 0 tombstoned cells | 23:52:31,890
| 127.0.0.1 |           1700
>                                                           Request complete | 23:52:31,890
| 127.0.0.1 |           2140
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message