incubator-drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@apache.org>
Subject Re: What do you want out of Apache Drill?
Date Wed, 13 Mar 2013 16:42:43 GMT
I have a feeling that large joins will be dealt with sooner rather than
later (especially with interest and work from people like you).  If you
look at large queries, things are dominated by large sorts, large joins and
large group-by aggregations.  We need to make sure those are performant in
large clusters before we focus on the prettier things.  Hopefully we can
leverage Google Compute Engine to ensure this.



On Wed, Mar 13, 2013 at 7:07 AM, David Alves <davidralves@gmail.com> wrote:

> Hi All
>
>         Sorry to revive an old thread…
>         I was going through the list looking for the current stance on
> joins and I found Ted's answer.
>         What is the main point behind not doing large joins on Drill?
>         Is it just simplicity (as in optimizer, etc.) or is there
> something else?
>         I mention this because I'm particularly interested in large self
> joins (I'd can volunteer to work on them myself, of course).
>         I'm not against leaving them out of any optimizer goals, if one
> can explicitly select an identity optimizer that will just follow the
> logical plan, but they are big requirement for me.
>         Thoughts?
>
> Best
> David
>
> On Dec 6, 2012, at 7:33 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
>
> > Drill is explicitly designed (at this time) with the option of not doing
> > large joins.  Triple stores pretty much  assume lots of large joins.
> >
> > That said, if you could write some suggested typical queries, it would
> help
> > the discussion along.  If you could go so far as to translate to a
> logical
> > plan, that would be even cooler.
> >
> > On Fri, Dec 7, 2012 at 2:25 AM, Mike Kogan <mkogan@gmail.com> wrote:
> >
> >> I would very much be interested in having a SPARQL interface, though I
> am
> >> not sure how well Drill will handle many joins.
> >>
> >>
> >> On Thu, Dec 6, 2012 at 5:13 PM, Ted Dunning <ted.dunning@gmail.com>
> wrote:
> >>
> >>> On Thu, Dec 6, 2012 at 8:44 PM, Julian Hyde <julianhyde@gmail.com>
> >> wrote:
> >>>
> >>>> ...
> >>>> 1 A SQL interface (in addition to DrQL interface)
> >>>>
> >>>
> >>> With your help, this may arrive before DrQL is integrated.
> >>>
> >>>
> >>>> 2 JDBC driver
> >>>>
> >>>
> >>> Should be pretty straightforward.  Not on anybody's task list just
> yet, I
> >>> don't think.
> >>>
> >>>
> >>>> 3 Access to the stack at a lower level (i.e. a way to use the
> >>>> high-performance scan operators without writing a query)
> >>>>
> >>>
> >>> Definitely going to happen.
> >>>
> >>>
> >>>> 4 Ability to query in-memory Java data in a compact form (e.g. arrays
> >> of
> >>>> primitives or nio buffers)
> >>>>
> >>>
> >>> I wonder if this is just a matter of writing a special scanner or a
> >> special
> >>> flavor of join at the execution point.  The scanner for the case where
> >> the
> >>> in-memory compact form is only readable in sequential form. The
> >>> join-operator if the memory can be accessed at random.
> >>>
> >>> ...
> >>>> I know some of these are outside of Drill's scope. If so, feel free
to
> >>>> disregard. But if you don't ask, you don't get. :)
> >>>>
> >>>
> >>> They all look pretty reasonable to me.
> >>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message