kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Guozhang Wang (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Deleted] (KAFKA-4608) RocksDBWindowStore.fetch() is inefficient for large ranges
Date Thu, 15 Feb 2018 00:35:03 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Guozhang Wang updated KAFKA-4608:
    Comment: was deleted

(was: I have filed https://issues.apache.org/jira/browse/KAFKA-6560 to tackle on this issue,
it aims to only use point queries for window stores than range queries.)

> RocksDBWindowStore.fetch() is inefficient for large ranges
> ----------------------------------------------------------
>                 Key: KAFKA-4608
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4608
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>    Affects Versions:
>            Reporter: Elias Levy
>            Priority: Major
> It is not unreasonable for a user to call {{RocksDBWindowStore.fetch}} to scan for a
key across a large time range.  For instance, someone may call it with a {{timeFrom}} of zero
or a {{timeTo}} of max long in an attempt to fetch keys matching across all time forwards
or backwards.  
> But if you do so, {{fetch}} will peg the CPU, as it attempts to iterate over every single
segment id in the range. That is obviously very inefficient.  
> {{fetch}} should trim the {{timeFrom}}/{{timeTo}} range based on the available time range
in the {{segments}} hash map, so that it only iterates over the available time range.

This message was sent by Atlassian JIRA

View raw message