kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Yu (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (KAFKA-4499) Add "getAllKeys" API for querying windowed KTable stores
Date Wed, 27 Sep 2017 02:06:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181875#comment-16181875
] 

Richard Yu edited comment on KAFKA-4499 at 9/27/17 2:05 AM:
------------------------------------------------------------

I noticed a couple of things when looking through the {{ReadOnlyWindowStore}} interface. The
 {{all(long timeFrom, long timeTo)}} could be easily implemented because the {{fetch}} method
provided does almost exactly what [~mjsax] mentioned above, except focusing solely on a range
of given keys:
{code}
     /**
     * Get all the key-value pairs in the given key range and time range from all
     * the existing windows.
     *
     * @param from      the first key in the range
     * @param to        the last key in the range
     * @param timeFrom  time range start (inclusive)
     * @param timeTo    time range end (inclusive)
     * @return an iterator over windowed key-value pairs {@code <Windowed<K>, value>}
     * @throws InvalidStateStoreException if the store is not initialized
     * @throws NullPointerException If null is used for any key.
     */
    KeyValueIterator<Windowed<K>, V> fetch(K from, K to, long timeFrom, long timeTo);
{code}
Notice the words in the description: "Gets all the key-value pairs within the given key range
and the time range from... ".
Thus, the {{all(long timeFrom, long timeTo)}} could just call the above {{fetch}} method with
setting the {{from}} and {{to}} with there respective minimum and maximums. (e.g. {{return
fetch(key_min, key_max, windowStore_min, windowStore_max))}}) And the {{allLatest()}} method
would only have to check and disard the nonlatest windows available for each key.

The problem here however is that the finding the minimum and maximum key ranges (which in
effect must be solved by the {{keys()}} method). 
>From my understanding, there is no practical way to retrieve this information from within
the class--as the {{fetch()}} methods were already given the keys or key ranges. 

Ultimately, does the keys be inputted through external means?






was (Author: yohan123):
I noticed a couple of things when looking through the {{ReadOnlyWindowStore}} interface. The
 {{all(long timeFrom, long timeTo)}} could be easily implemented because the {{fetch}} method
provided does almost exactly what [~mjsax] mentioned above, except focusing solely on a range
of given keys:
{code}
     /**
     * Get all the key-value pairs in the given key range and time range from all
     * the existing windows.
     *
     * @param from      the first key in the range
     * @param to        the last key in the range
     * @param timeFrom  time range start (inclusive)
     * @param timeTo    time range end (inclusive)
     * @return an iterator over windowed key-value pairs {@code <Windowed<K>, value>}
     * @throws InvalidStateStoreException if the store is not initialized
     * @throws NullPointerException If null is used for any key.
     */
    KeyValueIterator<Windowed<K>, V> fetch(K from, K to, long timeFrom, long timeTo);
{code}
Notice the words in the description: "Gets all the key-value pairs within the given key range
and the time range from... ".
Thus, the {{all(long timeFrom, long timeTo)}} could just call the above {{fetch}} method with
setting the {{from}} and {{to}} with there respective minimum and maximums. (e.g. {{return
fetch(windowStore_min, windowStore_max))}}) And the {{allLatest()}} method would only have
to check and disard the nonlatest windows available for each key.

The problem here however is that the finding the minimum and maximum key ranges (which in
effect must be solved by the {{keys()}} method). 
>From my understanding, there is no practical way to retrieve this information from within
the class--as the {{fetch()}} methods were already given the keys or key ranges. 

Ultimately, does the keys be inputted through external means?





> Add "getAllKeys" API for querying windowed KTable stores
> --------------------------------------------------------
>
>                 Key: KAFKA-4499
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4499
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Matthias J. Sax
>              Labels: needs-kip
>
> Currently, both {{KTable}} and windowed-{{KTable}} stores can be queried via IQ feature.
While {{ReadOnlyKeyValueStore}} (for {{KTable}} stores) provide method {{all()}} to scan the
whole store (ie, returns an iterator over all stored key-value pairs), there is no similar
API for {{ReadOnlyWindowStore}} (for windowed-{{KTable}} stores).
> This limits the usage of a windowed store, because the user needs to know what keys are
stored in order the query it. It would be useful to provide possible APIs like this (only
a rough sketch):
>  - {{keys()}} returns all keys available in the store (maybe together with available
time ranges)
>  - {{all(long timeFrom, long timeTo)}} that returns all window for a specific time range
>  - {{allLatest()}} that returns the latest window for each key
> Because this feature would require to scan multiple segments (ie, RockDB instances) it
would be quite inefficient with current store design. Thus, this feature also required to
redesign the underlying window store itself.
> Because this is a major change, a KIP (https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals)
is required. The KIP should cover the actual API design as well as the store refactoring.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message