impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yu feng <>
Subject about broadcast join and hash shuffle join
Date Fri, 05 May 2017 09:44:42 GMT
Hi All:

I find impala choose join algorithm by comparing data transmission size
between broad cast and shuffle join while generating physical execution
plan. what I am confused is why impala choose broadcast as default
implement(such as table do not compute stats) ?

In my experience, shuffle join maybe the better choice, and some of my
queries use broadcast join between two subquery with huge resultset and the
query costs has difference up to ten times (8s and 80s).

I think user should always compute stats for every partition, do you guys
have some good suggestion about this.

Thanks a lot

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message