cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jon Haddad (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-10221) arbitrary predicate pushdown on CL=ONE
Date Fri, 28 Aug 2015 15:24:46 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-10221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jon Haddad updated CASSANDRA-10221:
-----------------------------------
    Description: 
For analytics workloads (in particular I'm thinking spark) it would be nice if we could add
any predicate to the WHERE clause.  I added the CL=ONE requirement since it seems like this
may be insane to do with any other level of consistency.

Currently in the spark connector if you want to filter on an arbitrary column of a table,
you have to pull the entire table in memory via what is effectively a distributed SELECT *
with token ranges and CL=ONE (typically).  It would be much nicer to avoid pulling the extra
data into memory and just noop on the row if it doesn't satisfy the predicates. 

I think for sanity this should require the ALLOW FILTERING clause.

  was:For analytics workloads it would be nice if we could add any predicate.  I added the
CL=ONE requirement since it seems like this may be insane to do with any other level of consistency.


> arbitrary predicate pushdown on CL=ONE
> --------------------------------------
>
>                 Key: CASSANDRA-10221
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10221
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jon Haddad
>
> For analytics workloads (in particular I'm thinking spark) it would be nice if we could
add any predicate to the WHERE clause.  I added the CL=ONE requirement since it seems like
this may be insane to do with any other level of consistency.
> Currently in the spark connector if you want to filter on an arbitrary column of a table,
you have to pull the entire table in memory via what is effectively a distributed SELECT *
with token ranges and CL=ONE (typically).  It would be much nicer to avoid pulling the extra
data into memory and just noop on the row if it doesn't satisfy the predicates. 
> I think for sanity this should require the ALLOW FILTERING clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message