cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6704) Create wide row scanners
Date Sat, 15 Feb 2014 15:55:20 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902444#comment-13902444
] 

Benedict commented on CASSANDRA-6704:
-------------------------------------

bq. Saying that users will be confused by two ways to do something is not a concern.  If you
could argue that some of those other features would be done in 1 week or 1 month maybe, but
in reality they look very far off.

You contradict yourself here. Either it is a valid concern or not; the timeliness of the conflict
is irrelevant. Especially since features do not become widespread until months after release,
so 1 month is a very short time horizon. Try 1-2yrs for a reasonable *minimum* distance between
features if you want to release approaches that conflict, IMO.

bq. Suggesting my code go into a fork because it may be redundant with some undone future
work by someone else is just plain silly.

Calling the concerns of several other people "plain silly" doesn't seem fair. You may not
be concerned, but clearly several others are. They are not all being "plain silly"  - we're
not all out to stamp on your dreams.

bq. Sandboxing should be achieved by roles like a normal database new features should be added
like. 'GRANT CREATE FUNCTION'. Making a feature hard to use is not security.

Possibly, although "create function" rights in normal databases do not offer the client the
ability to access product internals. This is an unexpected and dangerous behaviour, one that
not everyone is going to get behind, and waving it away because it does not concern you does
not mean it is not a valid concern.

bq. Cassandra has a large deficit here

I mostly agree with you, although not everyone will. However I don't think rushing into one
person's ideal solution is the way forward for a large project like Cassandra. Let's take
some time to reach consensus on how to address this, if we want to (others I'm sure think
keeping the compute side separate from Cassandra is the right way forward, and we shouldn't
ride roughshod over their position without addressing any concerns).

Personally, I would like to see tight integration with a scripting language at some point
in Cassandra. But I think that integration needs to be carefully considered, and not rushed
into. Not at the official release level, anyway.

> Create wide row scanners
> ------------------------
>
>                 Key: CASSANDRA-6704
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Edward Capriolo
>            Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over rows and columns.
http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over ranges of
row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example many times
a user wishes to do some custom processing inside a row and does not wish to carry the data
across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into Filters
as well as some code that uses a Filter to page through and process data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
>     @Test
>     public void test_scanner() throws Exception
>     {
>       ColumnParent cp = new ColumnParent();
>       cp.setColumn_family("Standard1");
>       ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>       for (char a='a'; a < 'g'; a++){
>         Column c1 = new Column();
>         c1.setName((a+"").getBytes());
>         c1.setValue(new byte [0]);
>         c1.setTimestamp(System.nanoTime());
>         server.insert(key, cp, c1, ConsistencyLevel.ONE);
>       }
>       
>       FilterDesc d = new FilterDesc();
>       d.setSpec("GROOVY_CLASS_LOADER");
>       d.setName("limit3");
>       d.setCode("import org.apache.cassandra.dht.* \n" +
>               "import org.apache.cassandra.thrift.* \n" +
>           "public class Limit3 implements SFilter { \n " +
>           "public FilterReturn filter(ColumnOrSuperColumn col, List<ColumnOrSuperColumn>
filtered) {\n"+
>           " filtered.add(col);\n"+
>           " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : FilterReturn.FILTER_DONE;\n"+
>           "} \n" +
>         "}\n");
>       server.create_filter(d);
>       
>       
>       ScannerResult res = server.create_scanner("Standard1", "limit3", key, ByteBuffer.wrap("a".getBytes()));
>       Assert.assertEquals(3, res.results.size());
>     }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to get the
concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message