Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5E6BF18E2C for ; Tue, 26 Jan 2016 04:25:41 +0000 (UTC) Received: (qmail 83450 invoked by uid 500); 26 Jan 2016 04:25:40 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 83417 invoked by uid 500); 26 Jan 2016 04:25:40 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 83361 invoked by uid 99); 26 Jan 2016 04:25:40 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Jan 2016 04:25:40 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 12E992C1F57 for ; Tue, 26 Jan 2016 04:25:40 +0000 (UTC) Date: Tue, 26 Jan 2016 04:25:40 +0000 (UTC) From: "Stefania (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-8180) Optimize disk seek using min/max column name meta data when the LIMIT clause is used MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116655#comment-15116655 ] Stefania commented on CASSANDRA-8180: ------------------------------------- bq. That is, your code is currently completely disabled by the {{if (... iterator != null) return null; check in lowerBound()?}} Is there a reason to want the code committed then? It is enabled when there are no tombsones since the iterator is null. However we can do better, when there are tombstones we can still use the row index bound if there is one. bq. On the other hand, an assertion in {{UnfilteredIteratorWithLowerBound}} that saves the returned {{lowerBound()}} and checks the first {{next()}} to be greater-than or equal may be something we want to have anyway. I've added this. Whilst the test you proposed is fine, existing tests (e.g. {{DeleteTest.testDeleteWithRangeAndOneClusteringColumn()}}) started to fail but _only when compaction is enabled_. It seems we can have sstables with no tombstones, valid clustering values (e.g. 0) and yet the first item is a {{Bound.BOTTOM}}. Further, it seems that static columns could also have problems with clustering values so it's not just tombstones. See the comment in {{SinglePartitionReadCommand.shouldInclude()}}. Moving back to in progress, might end up modifying sstable metadata format after all. > Optimize disk seek using min/max column name meta data when the LIMIT clause is used > ------------------------------------------------------------------------------------ > > Key: CASSANDRA-8180 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8180 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths > Environment: Cassandra 2.0.10 > Reporter: DOAN DuyHai > Assignee: Stefania > Priority: Minor > Fix For: 3.x > > Attachments: 8180_001.yaml, 8180_002.yaml > > > I was working on an example of sensor data table (timeseries) and face a use case where C* does not optimize read on disk. > {code} > cqlsh:test> CREATE TABLE test(id int, col int, val text, PRIMARY KEY(id,col)) WITH CLUSTERING ORDER BY (col DESC); > cqlsh:test> INSERT INTO test(id, col , val ) VALUES ( 1, 10, '10'); > ... > >nodetool flush test test > ... > cqlsh:test> INSERT INTO test(id, col , val ) VALUES ( 1, 20, '20'); > ... > >nodetool flush test test > ... > cqlsh:test> INSERT INTO test(id, col , val ) VALUES ( 1, 30, '30'); > ... > >nodetool flush test test > {code} > After that, I activate request tracing: > {code} > cqlsh:test> SELECT * FROM test WHERE id=1 LIMIT 1; > activity | timestamp | source | source_elapsed > ---------------------------------------------------------------------------+--------------+-----------+---------------- > execute_cql3_query | 23:48:46,498 | 127.0.0.1 | 0 > Parsing SELECT * FROM test WHERE id=1 LIMIT 1; | 23:48:46,498 | 127.0.0.1 | 74 > Preparing statement | 23:48:46,499 | 127.0.0.1 | 253 > Executing single-partition query on test | 23:48:46,499 | 127.0.0.1 | 930 > Acquiring sstable references | 23:48:46,499 | 127.0.0.1 | 943 > Merging memtable tombstones | 23:48:46,499 | 127.0.0.1 | 1032 > Key cache hit for sstable 3 | 23:48:46,500 | 127.0.0.1 | 1160 > Seeking to partition beginning in data file | 23:48:46,500 | 127.0.0.1 | 1173 > Key cache hit for sstable 2 | 23:48:46,500 | 127.0.0.1 | 1889 > Seeking to partition beginning in data file | 23:48:46,500 | 127.0.0.1 | 1901 > Key cache hit for sstable 1 | 23:48:46,501 | 127.0.0.1 | 2373 > Seeking to partition beginning in data file | 23:48:46,501 | 127.0.0.1 | 2384 > Skipped 0/3 non-slice-intersecting sstables, included 0 due to tombstones | 23:48:46,501 | 127.0.0.1 | 2768 > Merging data from memtables and 3 sstables | 23:48:46,501 | 127.0.0.1 | 2784 > Read 2 live and 0 tombstoned cells | 23:48:46,501 | 127.0.0.1 | 2976 > Request complete | 23:48:46,501 | 127.0.0.1 | 3551 > {code} > We can clearly see that C* hits 3 SSTables on disk instead of just one, although it has the min/max column meta data to decide which SSTable contains the most recent data. > Funny enough, if we add a clause on the clustering column to the select, this time C* optimizes the read path: > {code} > cqlsh:test> SELECT * FROM test WHERE id=1 AND col > 25 LIMIT 1; > activity | timestamp | source | source_elapsed > ---------------------------------------------------------------------------+--------------+-----------+---------------- > execute_cql3_query | 23:52:31,888 | 127.0.0.1 | 0 > Parsing SELECT * FROM test WHERE id=1 AND col > 25 LIMIT 1; | 23:52:31,888 | 127.0.0.1 | 60 > Preparing statement | 23:52:31,888 | 127.0.0.1 | 277 > Executing single-partition query on test | 23:52:31,889 | 127.0.0.1 | 961 > Acquiring sstable references | 23:52:31,889 | 127.0.0.1 | 971 > Merging memtable tombstones | 23:52:31,889 | 127.0.0.1 | 1020 > Key cache hit for sstable 3 | 23:52:31,889 | 127.0.0.1 | 1108 > Seeking to partition beginning in data file | 23:52:31,889 | 127.0.0.1 | 1117 > Skipped 2/3 non-slice-intersecting sstables, included 0 due to tombstones | 23:52:31,889 | 127.0.0.1 | 1611 > Merging data from memtables and 1 sstables | 23:52:31,890 | 127.0.0.1 | 1624 > Read 1 live and 0 tombstoned cells | 23:52:31,890 | 127.0.0.1 | 1700 > Request complete | 23:52:31,890 | 127.0.0.1 | 2140 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)