flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join
Date Thu, 06 Aug 2015 16:59:04 GMT

    [ https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660334#comment-14660334
] 

ASF GitHub Bot commented on FLINK-2240:
---------------------------------------

Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/888#issuecomment-128442757
  
    Oh, I forgot to add the closing message to the commit, so the ASF bot did not close the
pull request. Can you close the pull request manually (only you as the owner can do that).


> Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join
> --------------------------------------------------------------------------------------------
>
>                 Key: FLINK-2240
>                 URL: https://issues.apache.org/jira/browse/FLINK-2240
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Chengxiang Li
>            Assignee: Chengxiang Li
>            Priority: Minor
>
> In Hybrid-Hash-Join, while small table does not fit into memory, part of the small table
data would be spilled to disk, and the counterpart partition of big table data would be spilled
to disk in probe phase as well. If we build a BloomFilter while spill small table to disk
during build phase, and use it to filter the big table records which tend to be spilled to
disk, this may greatly  reduce the spilled big table file size, and saved the disk IO cost
for writing and further reading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message