kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Guozhang Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-4750) KeyValueIterator returns null values
Date Mon, 26 Jun 2017 23:07:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063945#comment-16063945

Guozhang Wang commented on KAFKA-4750:

[~mjsax][~evis] [~mihbor] Thanks for your comments. I would like to think a bit more on the
general resolution for this case though before reviewing [~evis]'s patch:

1. In Kafka messages, "null" byte arrays indicate tombstones, note that this means that if
user's serde decide to serialize any objects into null for a log compacted topic (e.g. a changelog
topic of a state store), it meant to delete the record from the store.

2. In Kafka Streams state stores, we did NOT enforcing if "null" indicates deletion from the

     * Update the value associated with this key
     * @param key The key to associate the value to
     * @param value The value, it can be null.
     * @throws NullPointerException If null is used for key.
    void put(K key, V value);

However our implementation did treat value-typed "null" (note it is not "null" byte arrays
as in serialized messages) as deletions, since we implement {{delete(key)}} as {{put(key,
null)}}. As Evgeny / Michal mentioned, it is intuitive if our {{put}} semantics aligned with
Java's map operations:

...  // store initialized as empty

store.get(key); // returns null

store.put(key, value);
store.get(key);  // returns null

store.put(key, value);
store.put(key, null);  // we can interpret it as "associate the key with null" or simply delete
this key
store.get(key);  // returns null, though generally speaking it could indicate either the key
is associated with value or the key does not exist

Now assuming you have a customized serde that maps "null" object to "not-null" byte arrays,
in this case the above would still hold:

store.put(key, value);
store.put(key, null);  // now "null" object is just a special value that do not indicate deletion
store.get(key);  // returns null, but this should be interpreted as "the key is associated
with null"

Now assuming you have a customized serde that maps "not null" object to "null" byte arrays,
in this case the "not-null" object is really interpreted as a dummy value that the above still

store.put(key, value);
store.put(key, MY_DUMMY);  // serialized into "null" byte arrays
store.get(key);  // returns MY_DUMMY as "null" byte arrays is deserialized symmetrically

So I think if we want to allow the above customized interpretation then we should not implement
{{delete()}} as {{put(key, null)}} since "null" objects may not indicate deletions; if we
want to be more restrict then we should emphasize that in the javadoc above that "@param value
The value, it can be null which indicates deletion of the key".


> KeyValueIterator returns null values
> ------------------------------------
>                 Key: KAFKA-4750
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4750
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions:,,
>            Reporter: Michal Borowiecki
>            Assignee: Evgeny Veretennikov
>              Labels: newbie
>         Attachments: DeleteTest.java
> The API for ReadOnlyKeyValueStore.range method promises the returned iterator will not
return null values. However, after upgrading from to we found null values
are returned causing NPEs on our side.
> I found this happens after removing entries from the store and I found resemblance to
SAMZA-94 defect. The problem seems to be as it was there, when deleting entries and having
a serializer that does not return null when null is passed in, the state store doesn't actually
delete that key/value pair but the iterator will return null value for that key.
> When I modified our serilizer to return null when null is passed in, the problem went
away. However, I believe this should be fixed in kafka streams, perhaps with a similar approach
as SAMZA-94.

This message was sent by Atlassian JIRA

View raw message