cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward Capriolo (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-6704) Create wide row scanners
Date Sat, 15 Feb 2014 14:57:20 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902424#comment-13902424
] 

Edward Capriolo edited comment on CASSANDRA-6704 at 2/15/14 2:57 PM:
---------------------------------------------------------------------

{quote}
Sort of, but there are some important differences: 1) as Brandon says, the code is clearly
vetted by the database dev team deploying triggers, which can't be said here; and 2) we're
all Java experts here, and the execution context is the normal execution context of Cassandra,
which again we're all familiar with. Helping users with issues from dynamic class compilation
/ loading of languages we don't understand is quite a different matter IMO, especially once
sandboxing is introduced (which really would be essential as C*'s internal APIs are not safe
to be accessed, nor protected, and could be used dangerously). It's not clear to me this will
be pain free from our side to ensure it always works, either.

Also, with triggers we can more easily justify API breakages across minor/major versions that
require some work when upgrading, as they're well contained within their Cassandra deployment,
however if we expose internal APIs to client code we will necessarily see more pushback on
rapid development of these APIs, as the difficulty for users to migrate will be increased.
{quote}
You started the answer with "sorta". You are still allowing a user to put code in the execution
path. It is the exact same problem, if you let someone compile dynamic code or you let the
admin put the jar in a folder. You give someone the potential to break something. All dynamic
compiling does is make the result faster to break and faster to fix. Triggers should want
this feature as well. Right now if your trigger is wrong and you have a 100000 node cluster
fixing the problem could take days, if Cassandra adopted dynamic triggers this problem could
be fixed in minutes. 
{quote}
I think it's a pretty nice underlying goal, but it's a really heavyweight feature that needs
to be approached cautiously, and as Sylvain says, preferably coherently.{quote}
Please do not imply that this feature is not coherent, or bad which has been done several
times already. This is a good feature.

{quote}I do wonder if it mightn't be possible to offer this as an easy to apply patch in the
meantime, outside of the main Apache repository. {quote}
Nice compromise but I do not the apache way. This option is forcing me into a fork. This is
a non breaking change. It is new feature. Cassandra is an open source project. I am a user.
I want a feature. Someone else on thread says {quote}IMO, harnessing the invokeDynamic stuff
in the JVM thusly could have some compelling applications for us. {quote} I am not asking
other developers who make features with no votes to put changes and in forks you should not
ask me to do the same. 


was (Author: appodictic):
{quote}
Sort of, but there are some important differences: 1) as Brandon says, the code is clearly
vetted by the database dev team deploying triggers, which can't be said here; and 2) we're
all Java experts here, and the execution context is the normal execution context of Cassandra,
which again we're all familiar with. Helping users with issues from dynamic class compilation
/ loading of languages we don't understand is quite a different matter IMO, especially once
sandboxing is introduced (which really would be essential as C*'s internal APIs are not safe
to be accessed, nor protected, and could be used dangerously). It's not clear to me this will
be pain free from our side to ensure it always works, either.

Also, with triggers we can more easily justify API breakages across minor/major versions that
require some work when upgrading, as they're well contained within their Cassandra deployment,
however if we expose internal APIs to client code we will necessarily see more pushback on
rapid development of these APIs, as the difficulty for users to migrate will be increased.
{quote}
You started the answer with "sorta". You are still allowing a user to put code in the execution
path. It is the exact same problem, if you let someone compile dynamic code or you let the
admin put the jar in a folder. You give someone the potential to break something. All dynamic
compiling does is make the result faster to break and faster to fix. 
{quote}
I think it's a pretty nice underlying goal, but it's a really heavyweight feature that needs
to be approached cautiously, and as Sylvain says, preferably coherently.{quote}
Please do not imply that this feature is not coherent, or bad which has been done several
times already. This is a good feature.

{quote}I do wonder if it mightn't be possible to offer this as an easy to apply patch in the
meantime, outside of the main Apache repository. {quote}
Nice compromise but I do not the apache way. This option is forcing me into a fork. This is
a non breaking change. It is new feature. Cassandra is an open source project. I am a user.
I want a feature. Someone else on thread says {quote}IMO, harnessing the invokeDynamic stuff
in the JVM thusly could have some compelling applications for us. {quote} I am not asking
other developers who make features with no votes to put changes and in forks you should not
ask me to do the same. 

> Create wide row scanners
> ------------------------
>
>                 Key: CASSANDRA-6704
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Edward Capriolo
>            Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over rows and columns.
http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over ranges of
row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example many times
a user wishes to do some custom processing inside a row and does not wish to carry the data
across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into Filters
as well as some code that uses a Filter to page through and process data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
>     @Test
>     public void test_scanner() throws Exception
>     {
>       ColumnParent cp = new ColumnParent();
>       cp.setColumn_family("Standard1");
>       ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>       for (char a='a'; a < 'g'; a++){
>         Column c1 = new Column();
>         c1.setName((a+"").getBytes());
>         c1.setValue(new byte [0]);
>         c1.setTimestamp(System.nanoTime());
>         server.insert(key, cp, c1, ConsistencyLevel.ONE);
>       }
>       
>       FilterDesc d = new FilterDesc();
>       d.setSpec("GROOVY_CLASS_LOADER");
>       d.setName("limit3");
>       d.setCode("import org.apache.cassandra.dht.* \n" +
>               "import org.apache.cassandra.thrift.* \n" +
>           "public class Limit3 implements SFilter { \n " +
>           "public FilterReturn filter(ColumnOrSuperColumn col, List<ColumnOrSuperColumn>
filtered) {\n"+
>           " filtered.add(col);\n"+
>           " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : FilterReturn.FILTER_DONE;\n"+
>           "} \n" +
>         "}\n");
>       server.create_filter(d);
>       
>       
>       ScannerResult res = server.create_scanner("Standard1", "limit3", key, ByteBuffer.wrap("a".getBytes()));
>       Assert.assertEquals(3, res.results.size());
>     }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to get the
concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message