kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xavier Léauté (JIRA) <j...@apache.org>
Subject [jira] [Commented] (KAFKA-5285) Optimize upper / lower byte range for key range scan on windowed stores
Date Fri, 16 Feb 2018 21:22:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367892#comment-16367892

Xavier Léauté commented on KAFKA-5285:

[~davispw] yes I understand your concern, and that's part the reason I filed this JIRA in
the first place. Currently our internal APIs assume that knowing keyTo and timeTo is sufficient
to determine the upper bound. However, without information about the lower key, or timeFrom
it is not possible to determine an optimal upper bound for the byte range. Let me illustrate
the problem that [~guozhang] is trying to solve:

Let's say we query the key range {{A1}} through {{AAB}}, and timestamps {{0000}} through {{DFFF}}
(let's assume timestamps consist of 4 bytes for this example). We could have the following
A13333   (A1  | 3333)
AA0000   (AA  | 0000)
AAA0000  (AAA | 0000)
AAB1234  (AAB | 1234)
AAC321   (AA  | C321)
That is, it is possible for keys smaller than keyTo to show up after the bytes for (keyTo+maxSuffix).
This happens only if:
# keys within the range can be shorter than keyTo.
# The suffix (a.k.a. timestamp) bytes are greater than the bytes in the same position for

Without any additional information about the key length or or the lower bound, we can only
assume that keys are at least 1 byte, and that byte has to be smaller or equal to the first
byte of keyTo (i.e. our upper bound has to start with the first byte of keyTo), so our best
guess for and upper bound in that case is ADFFF.

NB: [~guozhang] in this particular case MAX(keyFrom + timeEndEarliest, keyTo + timeStartLatest)
= max(A1DFFF, AAB0000) = AAB0000 would not work since it does not include the full time range
for the AAB keys, or maybe I did not understand your suggestion.

Any key smaller than keyTo that appears after (keyTo+maxSuffix) is necessarily a prefix of
keyTo, so knowing the lower bound is only helpful if it shares a prefix with keyTo (e.g. if
in this case keyFrom was {{7F}}, then the lower bound would not be helpful).

However because the lower bound is {{A1}}, we now know keys starting with A have at least
two bytes, and the second byte has to be an A based on the previous statement. This means
me can reduce our upper bound to {{AADFFFF}}.

Also, if we have a smaller timeTo we can include any prefix of keyTo that does not contain
any byte larger than the first byte of  of timeTo, but it gets trickier once you have matching
bytes in the timestamp and the key[

Let me think this though a bit more and I'll try to come up with something that will work
optimally in most cases, but my hunch is that unless we introduce optimizations for stores
with fixed size keys, or have some information about the minimum key length, we will end up
scanning a large portion of the table for long time ranges.

> Optimize upper / lower byte range for key range scan on windowed stores
> -----------------------------------------------------------------------
>                 Key: KAFKA-5285
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5285
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Xavier Léauté
>            Assignee: Guozhang Wang
>            Priority: Major
>              Labels: performance
> The current implementation of {{WindowKeySchema}} / {{SessionKeySchema}} {{upperRange}}
and {{lowerRange}} does not make any assumptions with respect to the other key bound (e.g.
the upper byte bound does not depends on lower key bound).
> It should be possible to optimize the byte range somewhat further using the information
provided by the lower bound.
> More specifically, by incorporating that information, we should be able to eliminate
the corresponding {{upperRangeFixedSize}} and {{lowerRangeFixedSize}}, since the result should
be the same if we implement that optimization.

This message was sent by Atlassian JIRA

View raw message