impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Subject Re: about broadcast join and hash shuffle join
Date Fri, 05 May 2017 14:20:48 GMT
There's actually a review out right now for changing the default join
algorithm when stats are unavailable to partitioned:
https://gerrit.cloudera.org/#/c/6803/

On Fri, May 5, 2017 at 4:44 AM yu feng <olaptestyu@gmail.com> wrote:

> Hi All:
>
> I find impala choose join algorithm by comparing data transmission size
> between broad cast and shuffle join while generating physical execution
> plan. what I am confused is why impala choose broadcast as default
> implement(such as table do not compute stats) ?
>
> In my experience, shuffle join maybe the better choice, and some of my
> queries use broadcast join between two subquery with huge resultset and the
> query costs has difference up to ten times (8s and 80s).
>
> I think user should always compute stats for every partition, do you guys
> have some good suggestion about this.
>
> Thanks a lot
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message