Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BB4C210671 for ; Sat, 15 Feb 2014 13:51:21 +0000 (UTC) Received: (qmail 63377 invoked by uid 500); 15 Feb 2014 13:51:21 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 63359 invoked by uid 500); 15 Feb 2014 13:51:20 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 63348 invoked by uid 99); 15 Feb 2014 13:51:20 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 Feb 2014 13:51:20 +0000 Date: Sat, 15 Feb 2014 13:51:20 +0000 (UTC) From: "Benedict (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-6704) Create wide row scanners MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902405#comment-13902405 ] Benedict commented on CASSANDRA-6704: ------------------------------------- bq. Essentially the same thing, user code running inside cassandra. Sort of, but there are some important differences: 1) as Brandon says, the code is clearly vetted by the database dev team deploying triggers, which can't be said here; and 2) we're all Java experts here, and the execution context is the normal execution context of Cassandra, which again we're all familiar with. Helping users with issues from dynamic class compilation / loading of languages we don't understand is quite a different matter IMO, especially once sandboxing is introduced (which really would be essential as C*'s internal APIs are *not* safe to be accessed, nor protected, and could be used dangerously). It's not clear to me this will be pain free from our side to ensure it always works, either. Also, with triggers we can more easily justify API breakages across minor/major versions that require some work when upgrading, as they're well contained within their Cassandra deployment, however if we expose internal APIs to client code we will necessarily see more pushback on rapid development of these APIs, as the difficulty for users to migrate will be increased. bq. The language that you chose to implement the filter with is your call. This only seems to make my issue (1) worse, to my eyes bq. think about all the cql iterations like cql2 , execute_cql, execute_cql_3. Set keyspace set _consistency level. Well, these things are all still present. We may retire CQL2 soon, but that has the advantage of having very quickly been superseded by CQL3, which to my knowledge does not have dramatically different syntax anyway - and yet it still has stuck around. It's not yet clear what this would be superseded by, or if the functionality would map easily, and maintaining a deprecated access method doesn't reduce the support burden. I think it's a pretty nice underlying goal, but it's a really heavyweight feature that needs to be approached cautiously, and as Sylvain says, preferably coherently. I do wonder if it mightn't be possible to offer this as an easy to apply patch in the meantime, outside of the main Apache repository. There are definitely some users that would be happy with the security risks and would love this to play with, but those people are power users who would be comfortable applying a simple patch to their C* instance, and would not contribute excessively to the support burden as they'd be competent enough to figure out any issues they have. Just my 2c, anyway. > Create wide row scanners > ------------------------ > > Key: CASSANDRA-6704 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6704 > Project: Cassandra > Issue Type: New Feature > Reporter: Edward Capriolo > Assignee: Edward Capriolo > > The BigTable white paper demonstrates the use of scanners to iterate over rows and columns. http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf > Because Cassandra does not have a primary sorting on row keys scanning over ranges of row keys is less useful. > However we can use the scanner concept to operate on wide rows. For example many times a user wishes to do some custom processing inside a row and does not wish to carry the data across the network to do this processing. > I have already implemented thrift methods to compile dynamic groovy code into Filters as well as some code that uses a Filter to page through and process data on the server side. > https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk > The following is a working code snippet. > {code} > @Test > public void test_scanner() throws Exception > { > ColumnParent cp = new ColumnParent(); > cp.setColumn_family("Standard1"); > ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes()); > for (char a='a'; a < 'g'; a++){ > Column c1 = new Column(); > c1.setName((a+"").getBytes()); > c1.setValue(new byte [0]); > c1.setTimestamp(System.nanoTime()); > server.insert(key, cp, c1, ConsistencyLevel.ONE); > } > > FilterDesc d = new FilterDesc(); > d.setSpec("GROOVY_CLASS_LOADER"); > d.setName("limit3"); > d.setCode("import org.apache.cassandra.dht.* \n" + > "import org.apache.cassandra.thrift.* \n" + > "public class Limit3 implements SFilter { \n " + > "public FilterReturn filter(ColumnOrSuperColumn col, List filtered) {\n"+ > " filtered.add(col);\n"+ > " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : FilterReturn.FILTER_DONE;\n"+ > "} \n" + > "}\n"); > server.create_filter(d); > > > ScannerResult res = server.create_scanner("Standard1", "limit3", key, ByteBuffer.wrap("a".getBytes())); > Assert.assertEquals(3, res.results.size()); > } > {code} > I am going to be working on this code over the next few weeks but I wanted to get the concept our early so the design can see some criticism. -- This message was sent by Atlassian JIRA (v6.1.5#6160)