Mailing-List: contact dev-help@drill.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@drill.apache.org
MIME-Version: 1.0
In-Reply-To: <784E2D75-EC14-4779-9586-9E533F00B599@gmail.com>
References: 
 <CAFi_UEdLvyxgJX7yVzqTRnoY7bXCbSEZJ-8+7y_Jtd2DYT0FAw@mail.gmail.com>
	<CAJkA4MHEoC1JT+3iUFwFpNGGdHzkHmZah55d2ATAfDGQbMjaYw@mail.gmail.com>
	<784E2D75-EC14-4779-9586-9E533F00B599@gmail.com>
Date: Thu, 8 Jan 2015 15:36:57 -0800
Message-ID: 
 <CAKa9qD=_7Ne9r4cFy8BPyn_mKwju+uootMm1GZy1paJXugSc-g@mail.gmail.com>
Subject: Re: [DISCUSS] Cassandra storage for Drill
From: Jacques Nadeau <jacques@apache.org>
To: "dev@drill.apache.org" <dev@drill.apache.org>
Content-Type: multipart/alternative; boundary=047d7bdca1001267e8050c2c84e4

--047d7bdca1001267e8050c2c84e4
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Drill's framework does the same.  Drill leverages some of Calcite's
extension capabilities to allow very easy pushdowns by allowing storage
subsystems to expose optimizer rules (subclassed on top of Calcite's
optimizer rule construct).  On-top of what Calcite can do, Drill also
understand concepts like parallelization and data locality and lets systems
like Cassandra expose this information to vastly improve performance,
especially when working across multiple systems.

On Thu, Jan 8, 2015 at 12:41 PM, Julian Hyde <julianhyde@gmail.com> wrote:

> Calcite=E2=80=99s adapter framework makes it easy to push down filters,
> aggregations to third-party sources, and  to express more powerful and
> data-source-specific optimizations.
>
> Is Drill building on Calcite=E2=80=99s support or doing it its own way?
>
> Calcite doesn=E2=80=99t have a Cassandra adapter but the same approach ta=
ken in
> the MongoDb, Splunk, Phoenix adapters could be used.
>
> On Jan 8, 2015, at 9:11 AM, Tomer Shiran <tshiran@gmail.com> wrote:
>
> > I think that any valid SQL statement should work with any data source.
> > Drill should:
> >
> >   - Push down as much processing as possible into the data source
> >   (Cassandra in this case)
> >   - Maintain as much data locality as possible (ie, spread the work so
> >   that each drillbit is handling local data)
> >   - In the worst case, Drill should pull the entire table from the data
> >   source if that's what's needed to satisfy the query.
> >
> >
> > On Thu, Jan 8, 2015 at 8:29 AM, Yash Sharma <yash360@gmail.com> wrote:
> >
> >> Hi Folks,
> >> This thread is to discuss few scenarios how Cassandra works - and how
> do we
> >> think it should be supported in Drill.
> >>
> >> While they are not supported in Cassandra inherently but its doable on
> >> Drill's end once we fetch a superset of data without these cases.
> >>
> >> 1. Filtering non indexed column in Cassandra
> >> 2. Filtering by subset of primary key
> >> 3. OR condition in where clause
> >>
> >> Should we apply filters at Drill's end and support these features or w=
e
> >> propagate an error back to user for asking for a valid Cassandra based
> >> query?
> >>
> >> -----
> >> Examples:
> >> Here 'trending_now' is a dummy table with (id, rank, pog_id) where
> >> (id,rank) is primary key pair.
> >> 1.
> >> cqlsh:recsys> select * from trending_now where pog_id=3D10004 ;
> >> Bad Request: No indexed columns present in by-columns clause with Equa=
l
> >> operator
> >>
> >> 2.
> >> cqlsh:recsys> select * from trending_now where rank=3D4;
> >> Bad Request: Cannot execute this query as it might involve data
> filtering
> >> and thus may have unpredictable performance. If you want to execute th=
is
> >> query despite the performance unpredictability, use ALLOW FILTERING
> >> P.S. ALLOW FILTERING is not permitted in Cassandra java driver as of
> now.
> >>
> >> 3.
> >> cqlsh:recsys> select * from trending_now where rank=3D4 or id=3D'id000=
4';
> >> Bad Request: line 1:40 missing EOF at 'or'
> >>
> >> 4. Valid Query:
> >> cqlsh:recsys> select * from trending_now where id=3D'id0004' and rank=
=3D4;
> >>
> >> id     | rank | pog_id
> >> --------+------+--------
> >> id0004 |    4 |  10002
> >>
> >> (1 rows)
> >>
>
>

--047d7bdca1001267e8050c2c84e4--