hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt McCline (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-18908) FULL OUTER JOIN to MapJoin
Date Thu, 13 Sep 2018 20:53:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-18908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Matt McCline updated HIVE-18908:
--------------------------------
    Status: Patch Available  (was: In Progress)

Try again.

> FULL OUTER JOIN to MapJoin
> --------------------------
>
>                 Key: HIVE-18908
>                 URL: https://issues.apache.org/jira/browse/HIVE-18908
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Critical
>         Attachments: FULL OUTER MapJoin Code Changes.docx, HIVE-18908.01.patch, HIVE-18908.02.patch,
HIVE-18908.03.patch, HIVE-18908.04.patch, HIVE-18908.05.patch, HIVE-18908.06.patch, HIVE-18908.08.patch,
HIVE-18908.09.patch, HIVE-18908.091.patch, HIVE-18908.092.patch, HIVE-18908.093.patch, HIVE-18908.096.patch,
HIVE-18908.097.patch, HIVE-18908.098.patch, HIVE-18908.099.patch, HIVE-18908.0991.patch, HIVE-18908.0992.patch,
HIVE-18908.0993.patch, HIVE-18908.0994.patch, HIVE-18908.0995.patch, HIVE-18908.0996.patch,
HIVE-18908.0997.patch, HIVE-18908.0998.patch, HIVE-18908.0999.patch, HIVE-18908.09991.patch,
HIVE-18908.09992.patch, HIVE-18908.09993.patch, HIVE-18908.09994.patch, HIVE-18908.09995.patch,
HIVE-18908.09996.patch, JOIN to MAPJOIN Transformation.pdf, SHARED-MEMORY FULL OUTER MapJoin.pdf
>
>
> Currently, we do not support FULL OUTER JOIN in MapJoin.
> Rough TPC-DS timings run on laptop:
> (NOTE: Query 51 has PTF as a bigger serial portion -- Amdahl's law at play)
> FULL OUTER MapJoin OFF =  MergeJoin
> Query 51:
> o	Vectorization OFF
> •	FULL OUTER MapJoin OFF: 4:30 minutes
> •	FULL OUTER MapJoin ON: 4:37 minutes
> o	Vectorization ON
> •	FULL OUTER MapJoin OFF: 2:35 minutes
> •	FULL OUTER MapJoin ON: 1:47 minutes
> Query 97:
> o	Vectorization OFF
> •	FULL OUTER MapJoin OFF: 2:37 minutes
> •	FULL OUTER MapJoin ON: 2:42 minutes
> o	Vectorization ON
> •	FULL OUTER MapJoin OFF: 1:17 minutes
> •	FULL OUTER MapJoin ON: 0:06 minutes
> FULL OUTER Join 10,000,000 rows against 323,910 small table keys
> o	Vectorization ON
> •	FULL OUTER MapJoin OFF: 14:56 minutes
> •	FULL OUTER MapJoin ON: 1:45 minutes
> FULL OUTER Join 10,000,000 rows against 1,000 small table keys
> o	Vectorization ON
> •	FULL OUTER MapJoin OFF: 12:37 minutes
> •	FULL OUTER MapJoin ON: 1:38 minutes
> Hopefully, someone will do large scale cluster testing.  [DynamicPartitionedHashJoin]
MapJoin should scale dramatically better than [Sort] MergeJoin reduce-shuffle.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message