kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Guozhang Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-4750) KeyValueIterator returns null values
Date Tue, 27 Jun 2017 17:19:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16065155#comment-16065155
] 

Guozhang Wang commented on KAFKA-4750:
--------------------------------------

[~evis] Inside RocksDB store, after the serialization, if we get "null" byte arrays (NOTE
it is not "null" object that gets passed into the API) then we should always treat it as a
delete call; i.e. the current implementation inside RocksDB is ok:

{code}
private void putInternal(byte[] rawKey, byte[] rawValue) {
        if (rawValue == null) {
            try {
                db.delete(wOptions, rawKey);
            } catch (RocksDBException e) {
               ...
            }
        } else {
            try {
                db.put(wOptions, rawKey, rawValue);
            } catch (RocksDBException e) {
                ...
            }
        }
    }
{code}

The question is, on the API layer do we want to enforce "null" object to indicate deletion
as well. Currently we are a bit vague in this, I was proposing two options and make it clear:

1) Clarify in javadoc that null value in {{put(key, value)}} indicates deletion; if it is
"null" object by-pass the serde and send "null" bytes directly into inner functions and vice
verse for deserialization; do not enforce user customized serdes how to handle null values
since we are not going to call them with null values any more.

2) Do NOT enforce in java doc that null value in {{put(key, value)}} indicates deletion; implement
all {{delete(key)}} call directly throughout all the layers of stores instead of calling {{put(key,
null)}}; recommend user customized serdes to handle null values themselves.

I am a bit inclined to the second option, and [~mjsax] seem to be favoring the first option.
And I'd like to hear see how others think.

> KeyValueIterator returns null values
> ------------------------------------
>
>                 Key: KAFKA-4750
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4750
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 0.10.1.1, 0.11.0.0, 0.10.2.1
>            Reporter: Michal Borowiecki
>            Assignee: Evgeny Veretennikov
>              Labels: newbie
>         Attachments: DeleteTest.java
>
>
> The API for ReadOnlyKeyValueStore.range method promises the returned iterator will not
return null values. However, after upgrading from 0.10.0.0 to 0.10.1.1 we found null values
are returned causing NPEs on our side.
> I found this happens after removing entries from the store and I found resemblance to
SAMZA-94 defect. The problem seems to be as it was there, when deleting entries and having
a serializer that does not return null when null is passed in, the state store doesn't actually
delete that key/value pair but the iterator will return null value for that key.
> When I modified our serilizer to return null when null is passed in, the problem went
away. However, I believe this should be fixed in kafka streams, perhaps with a similar approach
as SAMZA-94.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message