Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EE54117C82 for ; Tue, 31 Mar 2015 16:07:57 +0000 (UTC) Received: (qmail 99605 invoked by uid 500); 31 Mar 2015 16:07:57 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 99568 invoked by uid 500); 31 Mar 2015 16:07:57 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 99557 invoked by uid 99); 31 Mar 2015 16:07:57 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Mar 2015 16:07:57 +0000 Date: Tue, 31 Mar 2015 16:07:57 +0000 (UTC) From: "Sylvain Lebresne (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-8099) Refactor and modernize the storage engine MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-8099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14388753#comment-14388753 ] Sylvain Lebresne commented on CASSANDRA-8099: --------------------------------------------- Thanks for some initial review. I've pushed a commit for most of the remarks. I answer the rest below: bq. I wonder if it will not be better to have two sub-classes for {{PrimaryKeyRestrictionSet}} one for the partition key and one for the clustering columns rather than having a boolean variable I don't disagree, but think we should generally clean the handling of partition keys so it doesn't "pretend" to be clustering columns (which will imply separating into 2 classes). And so I'd like to do that properly as a followup since it's not essential to this ticket (and I'm sure we can agree it's big enough as is). bq. In {{StatementRestrictions}} I do not understand the use of {{useFiltering}}. My understanding was that we should return an error message specifying that {{ALLOW FILTERING}} is required and that this problem should have been handled by {{checkNeedsFiltering}} in {{SelectStatement}}. If you have restriction on an indexed column but that restriction is no "queriable" (not an equality), we actually always reject the query (as in, even with {{ALLOW FILTERING}}) with an error message that says we can't find a usable 2ndary index. I'm not saying this is a good thing, it's really just historical and we should fix it, but this ticket is arguably not the right place for this (CASSANDRA-4987 would typically be a better place for that). We also don't even call {{needsFiltering}} if the query is not a range one (because we don't support {{ALLOW FILTERING}} there yet, which is CASSANDRA-6377), but we should still reject the queries desribed above (restriction on an indexed column but not one usable by the index) for single partition queries. Another way to put it is that the added validation is just the validation that is done on trunk in {{SecondaryIndexManager.validateIndexSearchersForQuery}} (and so was not handled by {{checkNeedsFiltering}}) which I moved in {{StatementRestrictions}} because that was convenient for the patch. TL;DR, we should clean all this in follow-ups, but I'd rather keep it simple for this ticket. > Refactor and modernize the storage engine > ----------------------------------------- > > Key: CASSANDRA-8099 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8099 > Project: Cassandra > Issue Type: Improvement > Reporter: Sylvain Lebresne > Assignee: Sylvain Lebresne > Fix For: 3.0 > > Attachments: 8099-nit > > > The current storage engine (which for this ticket I'll loosely define as "the code implementing the read/write path") is suffering from old age. One of the main problem is that the only structure it deals with is the cell, which completely ignores the more high level CQL structure that groups cell into (CQL) rows. > This leads to many inefficiencies, like the fact that during a reads we have to group cells multiple times (to count on replica, then to count on the coordinator, then to produce the CQL resultset) because we forget about the grouping right away each time (so lots of useless cell names comparisons in particular). But outside inefficiencies, having to manually recreate the CQL structure every time we need it for something is hindering new features and makes the code more complex that it should be. > Said storage engine also has tons of technical debt. To pick an example, the fact that during range queries we update {{SliceQueryFilter.count}} is pretty hacky and error prone. Or the overly complex ways {{AbstractQueryPager}} has to go into to simply "remove the last query result". > So I want to bite the bullet and modernize this storage engine. I propose to do 2 main things: > # Make the storage engine more aware of the CQL structure. In practice, instead of having partitions be a simple iterable map of cells, it should be an iterable list of row (each being itself composed of per-column cells, though obviously not exactly the same kind of cell we have today). > # Make the engine more iterative. What I mean here is that in the read path, we end up reading all cells in memory (we put them in a ColumnFamily object), but there is really no reason to. If instead we were working with iterators all the way through, we could get to a point where we're basically transferring data from disk to the network, and we should be able to reduce GC substantially. > Please note that such refactor should provide some performance improvements right off the bat but it's not it's primary goal either. It's primary goal is to simplify the storage engine and adds abstraction that are better suited to further optimizations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)