flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chengxiang Li (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-2240) Use BloomFilter to minmize build side records which spilled to disk in Hybrid-Hash-Join
Date Thu, 18 Jun 2015 09:03:01 GMT
Chengxiang Li created FLINK-2240:
------------------------------------

             Summary: Use BloomFilter to minmize build side records which spilled to disk
in Hybrid-Hash-Join
                 Key: FLINK-2240
                 URL: https://issues.apache.org/jira/browse/FLINK-2240
             Project: Flink
          Issue Type: Improvement
          Components: Core
            Reporter: Chengxiang Li
            Priority: Minor


In Hybrid-Hash-Join, while small table does not fit into memory, part of the small table data
would be spilled to disk, and the counterpart partition of big table data would be spilled
to disk in probe phase as well. If we build a BloomFilter while spill small table to disk
during build phase, and use it to filter the big table records which tend to be spilled to
disk, this may greatly  reduce the spilled big table file size, and saved the disk IO cost
for writing and further reading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message