kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Guozhang Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-4750) KeyValueIterator returns null values
Date Mon, 26 Jun 2017 23:07:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063945#comment-16063945
] 

Guozhang Wang commented on KAFKA-4750:
--------------------------------------

[~mjsax][~evis] [~mihbor] Thanks for your comments. I would like to think a bit more on the
general resolution for this case though before reviewing [~evis]'s patch:

1. In Kafka messages, "null" byte arrays indicate tombstones, note that this means that if
user's serde decide to serialize any objects into null for a log compacted topic (e.g. a changelog
topic of a state store), it meant to delete the record from the store.

2. In Kafka Streams state stores, we did NOT enforcing if "null" indicates deletion from the
javadoc:

{code}
    /**
     * Update the value associated with this key
     *
     * @param key The key to associate the value to
     * @param value The value, it can be null.
     * @throws NullPointerException If null is used for key.
     */
    void put(K key, V value);
{code}

However our implementation did treat value-typed "null" (note it is not "null" byte arrays
as in serialized messages) as deletions, since we implement {{delete(key)}} as {{put(key,
null)}}. As Evgeny / Michal mentioned, it is intuitive if our {{put}} semantics aligned with
Java's map operations:

{code}
...  // store initialized as empty

store.get(key); // returns null

store.put(key, value);
store.delete(key);
store.get(key);  // returns null

store.put(key, value);
store.put(key, null);  // we can interpret it as "associate the key with null" or simply delete
this key
store.get(key);  // returns null, though generally speaking it could indicate either the key
is associated with value or the key does not exist
{code}

Now assuming you have a customized serde that maps "null" object to "not-null" byte arrays,
in this case the above would still hold:

{code}
store.put(key, value);
store.put(key, null);  // now "null" object is just a special value that do not indicate deletion
store.get(key);  // returns null, but this should be interpreted as "the key is associated
with null"
{code}

Now assuming you have a customized serde that maps "not null" object to "null" byte arrays,
in this case the "not-null" object is really interpreted as a dummy value that the above still
holds

{code}
store.put(key, value);
store.put(key, MY_DUMMY);  // serialized into "null" byte arrays
store.get(key);  // returns MY_DUMMY as "null" byte arrays is deserialized symmetrically
{code}

So I think if we want to allow the above customized interpretation then we should not implement
{{delete()}} as {{put(key, null)}} since "null" objects may not indicate deletions; if we
want to be more restrict then we should emphasize that in the javadoc above that "@param value
The value, it can be null which indicates deletion of the key".

WDYT?

> KeyValueIterator returns null values
> ------------------------------------
>
>                 Key: KAFKA-4750
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4750
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 0.10.1.1, 0.11.0.0, 0.10.2.1
>            Reporter: Michal Borowiecki
>            Assignee: Evgeny Veretennikov
>              Labels: newbie
>         Attachments: DeleteTest.java
>
>
> The API for ReadOnlyKeyValueStore.range method promises the returned iterator will not
return null values. However, after upgrading from 0.10.0.0 to 0.10.1.1 we found null values
are returned causing NPEs on our side.
> I found this happens after removing entries from the store and I found resemblance to
SAMZA-94 defect. The problem seems to be as it was there, when deleting entries and having
a serializer that does not return null when null is passed in, the state store doesn't actually
delete that key/value pair but the iterator will return null value for that key.
> When I modified our serilizer to return null when null is passed in, the problem went
away. However, I believe this should be fixed in kafka streams, perhaps with a similar approach
as SAMZA-94.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message