drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hanifi Gunes <hgu...@maprtech.com>
Subject Re: Incorrect logic in DrillScanRel
Date Tue, 10 Feb 2015 00:06:48 GMT
Filed https://issues.apache.org/jira/browse/DRILL-2192 to track this.

On Sun, Feb 8, 2015 at 3:03 PM, Jacques Nadeau <jacques@apache.org> wrote:

> I think the following makes sense (from your email)
>
> null -> scan-all
> non-empty with star -> scan-all
> non-empty w/o star -> regular / scan-some
> empty -> count(*) / skip-all
>
> The reason I identified it in the DrillScanRel is because the planner
> actually returns an empty set for the count(*) case as that actually gets
> rewritten as count(0) or count(1) if I recall correctly.  As such, the list
> of columns needed to satisfy are the empty set.
>
> Agreed that most (all?) readers don't support currently.  We should address
> this separately and make sure the readers handle what they are given (e.g.
> if they don't support skip-all, they should convert to something they do
> support) rather than removing the capability in the Rel layer.
>
> On Sun, Feb 8, 2015 at 2:44 PM, Hanifi Gunes <hgunes@maprtech.com> wrote:
>
> > - We should treat an empty list different from a null list different from
> > an all list.
> > Agreed. What would each mean to DrillScanRel though?
> > null -> scan-all or problem?
> > non-empty with star -> scan-all
> > non-empty w/o star -> regular / scan-some
> > empty -> count(*) / skip-all
> >
> >
> > I have no concrete understanding of what planner outputs as the list of
> > columns in case of count(*) vs select * queries. It would be nice to
> > differentiate these cases without use of "null" to mean anything special
> > though. Earlier, planner -at least in DrillScanRule- used to invoke a
> c'tor
> > that used to pass "null" to mean "all" which led some ambiguity: is it
> that
> > code is broken or callee asks me to scan everything? This was one of the
> > reasons for disallowing nulls. Another one was to wipe out all null
> checks
> > on columns from all of the readers.
> >
> >
> > The other part of the consideration is readers. Readers don't seem to
> > support skip-all queries particularly for count. Looks like the current
> > JsonReader implementation(so does mongo reader) after major refactoring
> > also relies on a non-empty list of columns while ensuring at least one
> > column is scanned for count queries, which I doubt works as intended in
> > case an empty list is used to mean skip-all. Part of the code change to
> > make count(*) more efficient is to make every reader understand skip-all
> > for count queries.
> >
> >
> > Earlier we had some discussions on how to make count(*) more efficient as
> > well. A good starting point could be to resolve ambiguity at the planner
> > side with the help of planning experts perhaps. If reader knows for sure
> > that she executing a count(*) query, she should be able to output a cheap
> > count vector instead of scanning everything without enforcing schema
> check
> > on the count column for readers that is schema-ed.
> >
> > How does this sound?
> >
> >
> > Thanks.
> > -Hanifi
> >
> >
> > On Sat, Feb 7, 2015 at 6:47 PM, Jacques Nadeau <jacques@apache.org>
> wrote:
> >
> > > Hey Guys,
> > >
> > > When reviewing our code I ran across this code:
> > >
> > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillScanRel.java#L70
> > >
> > > The problem with this code is it converts no columns into every column.
> > > This means we waste a bunch of time readers where we don't need to
> > project
> > > any columns.  An example is select count(*) which turns into select
> > > count(0).  In that case, we should avoid projecting any columns other
> > than
> > > a count column.  However, because we translate no columns into every
> > > column, we read way more than we should.  I remember someone (maybe
> > Hanifi)
> > > working through some rules around not allowing a blank list of columns
> > > anywhere.  What was the thinking and how do we correct.  We should
> treat
> > an
> > > empty list different from a null list different from an all list.
> > >
> > > thoughts?
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message