impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Tauber-Marshall <>
Subject Re: about broadcast join and hash shuffle join
Date Fri, 05 May 2017 14:20:48 GMT
There's actually a review out right now for changing the default join
algorithm when stats are unavailable to partitioned:

On Fri, May 5, 2017 at 4:44 AM yu feng <> wrote:

> Hi All:
> I find impala choose join algorithm by comparing data transmission size
> between broad cast and shuffle join while generating physical execution
> plan. what I am confused is why impala choose broadcast as default
> implement(such as table do not compute stats) ?
> In my experience, shuffle join maybe the better choice, and some of my
> queries use broadcast join between two subquery with huge resultset and the
> query costs has difference up to ten times (8s and 80s).
> I think user should always compute stats for every partition, do you guys
> have some good suggestion about this.
> Thanks a lot

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message