Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 587CB10462 for ; Tue, 18 Feb 2014 19:46:47 +0000 (UTC) Received: (qmail 45153 invoked by uid 500); 18 Feb 2014 19:46:40 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 45094 invoked by uid 500); 18 Feb 2014 19:46:39 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 45038 invoked by uid 99); 18 Feb 2014 19:46:36 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Feb 2014 19:46:36 +0000 Date: Tue, 18 Feb 2014 19:46:36 +0000 (UTC) From: "Edward Capriolo (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-6704) Create wide row scanners MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904480#comment-13904480 ] Edward Capriolo commented on CASSANDRA-6704: -------------------------------------------- But I see no reason why a query language could not use a scanner to help it answer a query. Potentially paging the results of N scans at once or something like that. > Create wide row scanners > ------------------------ > > Key: CASSANDRA-6704 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6704 > Project: Cassandra > Issue Type: New Feature > Reporter: Edward Capriolo > Assignee: Edward Capriolo > > The BigTable white paper demonstrates the use of scanners to iterate over rows and columns. http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf > Because Cassandra does not have a primary sorting on row keys scanning over ranges of row keys is less useful. > However we can use the scanner concept to operate on wide rows. For example many times a user wishes to do some custom processing inside a row and does not wish to carry the data across the network to do this processing. > I have already implemented thrift methods to compile dynamic groovy code into Filters as well as some code that uses a Filter to page through and process data on the server side. > https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk > The following is a working code snippet. > {code} > @Test > public void test_scanner() throws Exception > { > ColumnParent cp = new ColumnParent(); > cp.setColumn_family("Standard1"); > ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes()); > for (char a='a'; a < 'g'; a++){ > Column c1 = new Column(); > c1.setName((a+"").getBytes()); > c1.setValue(new byte [0]); > c1.setTimestamp(System.nanoTime()); > server.insert(key, cp, c1, ConsistencyLevel.ONE); > } > > FilterDesc d = new FilterDesc(); > d.setSpec("GROOVY_CLASS_LOADER"); > d.setName("limit3"); > d.setCode("import org.apache.cassandra.dht.* \n" + > "import org.apache.cassandra.thrift.* \n" + > "public class Limit3 implements SFilter { \n " + > "public FilterReturn filter(ColumnOrSuperColumn col, List filtered) {\n"+ > " filtered.add(col);\n"+ > " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : FilterReturn.FILTER_DONE;\n"+ > "} \n" + > "}\n"); > server.create_filter(d); > > > ScannerResult res = server.create_scanner("Standard1", "limit3", key, ByteBuffer.wrap("a".getBytes())); > Assert.assertEquals(3, res.results.size()); > } > {code} > I am going to be working on this code over the next few weeks but I wanted to get the concept our early so the design can see some criticism. -- This message was sent by Atlassian JIRA (v6.1.5#6160)