drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@apache.org>
Subject Re: [DISCUSS] Cassandra storage for Drill
Date Thu, 08 Jan 2015 23:36:57 GMT
Drill's framework does the same.  Drill leverages some of Calcite's
extension capabilities to allow very easy pushdowns by allowing storage
subsystems to expose optimizer rules (subclassed on top of Calcite's
optimizer rule construct).  On-top of what Calcite can do, Drill also
understand concepts like parallelization and data locality and lets systems
like Cassandra expose this information to vastly improve performance,
especially when working across multiple systems.

On Thu, Jan 8, 2015 at 12:41 PM, Julian Hyde <julianhyde@gmail.com> wrote:

> Calcite’s adapter framework makes it easy to push down filters,
> aggregations to third-party sources, and  to express more powerful and
> data-source-specific optimizations.
>
> Is Drill building on Calcite’s support or doing it its own way?
>
> Calcite doesn’t have a Cassandra adapter but the same approach taken in
> the MongoDb, Splunk, Phoenix adapters could be used.
>
> On Jan 8, 2015, at 9:11 AM, Tomer Shiran <tshiran@gmail.com> wrote:
>
> > I think that any valid SQL statement should work with any data source.
> > Drill should:
> >
> >   - Push down as much processing as possible into the data source
> >   (Cassandra in this case)
> >   - Maintain as much data locality as possible (ie, spread the work so
> >   that each drillbit is handling local data)
> >   - In the worst case, Drill should pull the entire table from the data
> >   source if that's what's needed to satisfy the query.
> >
> >
> > On Thu, Jan 8, 2015 at 8:29 AM, Yash Sharma <yash360@gmail.com> wrote:
> >
> >> Hi Folks,
> >> This thread is to discuss few scenarios how Cassandra works - and how
> do we
> >> think it should be supported in Drill.
> >>
> >> While they are not supported in Cassandra inherently but its doable on
> >> Drill's end once we fetch a superset of data without these cases.
> >>
> >> 1. Filtering non indexed column in Cassandra
> >> 2. Filtering by subset of primary key
> >> 3. OR condition in where clause
> >>
> >> Should we apply filters at Drill's end and support these features or we
> >> propagate an error back to user for asking for a valid Cassandra based
> >> query?
> >>
> >> -----
> >> Examples:
> >> Here 'trending_now' is a dummy table with (id, rank, pog_id) where
> >> (id,rank) is primary key pair.
> >> 1.
> >> cqlsh:recsys> select * from trending_now where pog_id=10004 ;
> >> Bad Request: No indexed columns present in by-columns clause with Equal
> >> operator
> >>
> >> 2.
> >> cqlsh:recsys> select * from trending_now where rank=4;
> >> Bad Request: Cannot execute this query as it might involve data
> filtering
> >> and thus may have unpredictable performance. If you want to execute this
> >> query despite the performance unpredictability, use ALLOW FILTERING
> >> P.S. ALLOW FILTERING is not permitted in Cassandra java driver as of
> now.
> >>
> >> 3.
> >> cqlsh:recsys> select * from trending_now where rank=4 or id='id0004';
> >> Bad Request: line 1:40 missing EOF at 'or'
> >>
> >> 4. Valid Query:
> >> cqlsh:recsys> select * from trending_now where id='id0004' and rank=4;
> >>
> >> id     | rank | pog_id
> >> --------+------+--------
> >> id0004 |    4 |  10002
> >>
> >> (1 rows)
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message