impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yu feng <olaptes...@gmail.com>
Subject Re: about broadcast join and hash shuffle join
Date Mon, 08 May 2017 01:45:45 GMT
Great! I agree to defaulting to partitioned joins should reduce the risk of
disastrous plans.

2017-05-05 22:20 GMT+08:00 Thomas Tauber-Marshall <tmarshall@cloudera.com>:

> There's actually a review out right now for changing the default join
> algorithm when stats are unavailable to partitioned:
> https://gerrit.cloudera.org/#/c/6803/
>
> On Fri, May 5, 2017 at 4:44 AM yu feng <olaptestyu@gmail.com> wrote:
>
> > Hi All:
> >
> > I find impala choose join algorithm by comparing data transmission size
> > between broad cast and shuffle join while generating physical execution
> > plan. what I am confused is why impala choose broadcast as default
> > implement(such as table do not compute stats) ?
> >
> > In my experience, shuffle join maybe the better choice, and some of my
> > queries use broadcast join between two subquery with huge resultset and
> the
> > query costs has difference up to ten times (8s and 80s).
> >
> > I think user should always compute stats for every partition, do you guys
> > have some good suggestion about this.
> >
> > Thanks a lot
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message