hudi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sivabalan narayanan (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HUDI-686) Implement BloomIndexV2 that does not depend on memory caching
Date Mon, 23 Mar 2020 01:31:00 GMT

    [ https://issues.apache.org/jira/browse/HUDI-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17064472#comment-17064472
] 

sivabalan narayanan commented on HUDI-686:
------------------------------------------

Interesting impl [~vinoth] . Some initial thoughts.
 * Wrt candidates, I don't think we might run into OOM as its bounded to one partition. 
 * May I know why we need external spillableMap? why can't we use regular map. I don't know
the benefits of external spillable map if all entries could be held in memory. Here too, one
executor will have to hold at max all file infos for one partition only right? So, memory
is bounded here too in my understanding. 

 

> Implement BloomIndexV2 that does not depend on memory caching
> -------------------------------------------------------------
>
>                 Key: HUDI-686
>                 URL: https://issues.apache.org/jira/browse/HUDI-686
>             Project: Apache Hudi (incubating)
>          Issue Type: Improvement
>          Components: Index, Performance
>            Reporter: Vinoth Chandar
>            Assignee: Vinoth Chandar
>            Priority: Major
>             Fix For: 0.6.0
>
>         Attachments: Screen Shot 2020-03-19 at 10.15.10 AM.png, Screen Shot 2020-03-19
at 10.15.10 AM.png, Screen Shot 2020-03-19 at 10.15.10 AM.png, image-2020-03-19-10-17-43-048.png
>
>
> Main goals here is to provide a much simpler index, without advanced optimizations like
auto tuned parallelism/skew handling but a better out-of-experience for small workloads. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message