kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stanislav Kozlovski <stanis...@confluent.io>
Subject Re: [VOTE] KIP-334 Include partitions in exceptions raised during consumer record deserialization/validation
Date Fri, 11 Oct 2019 17:05:31 GMT
Thanks Jason. I've edited the KIP with the latest proposal.

On Thu, Oct 10, 2019 at 2:00 AM Jason Gustafson <jason@confluent.io> wrote:

> Hi Stanislav,
>
> Sorry for the late comment. I'm happy with the current proposal. Just one
> small request is to include an example which shows how a user could use
> this to skip over a record.
>
> I'd suggest pushing this to a vote to see if anyone else has feedback.
>
> Thanks,
> Jason
>
> On Sat, Jul 13, 2019 at 2:27 PM Stanislav Kozlovski <
> stanislav@confluent.io>
> wrote:
>
> > Hello,
> >
> > Could we restart the discussion here again?
> >
> > My last message was sent on the 3rd of June but I haven't received
> replies
> > since then.
> >
> > I'd like to get this KIP to a finished state, regardless of whether that
> is
> > merged-in or discarded. It has been almost one year since the publication
> > of the KIP.
> >
> > Thanks,
> > Stanislav
> >
> > On Mon, Jun 3, 2019 at 11:19 AM Stanislav Kozlovski <
> > stanislav@confluent.io>
> > wrote:
> >
> > > Do people agree with the approach I outlined in my last reply?
> > >
> > > On Mon, May 6, 2019 at 2:12 PM Stanislav Kozlovski <
> > stanislav@confluent.io>
> > > wrote:
> > >
> > >> Hey there Kamal,
> > >>
> > >> I'm sincerely sorry for missing your earlier message. As I open this
> > >> thread up, I see I have an unsent draft message about resuming
> > discussion
> > >> from some time ago.
> > >>
> > >> In retrospect, I think I may have been too pedantic with the exception
> > >> naming and hierarchy.
> > >> I now believe a single exception type of
> > `RecordDeserializationException`
> > >> is enough. Let's go with that.
> > >>
> > >> On Mon, May 6, 2019 at 6:40 AM Kamal Chandraprakash <
> > >> kamal.chandraprakash@gmail.com> wrote:
> > >>
> > >>> Matthias,
> > >>>
> > >>> We already have CorruptRecordException which doesn't extend the
> > >>> SerializationException. So, we need an alternate
> > >>> name suggestion for the corrupted record error if we decide to extend
> > the
> > >>> FaultyRecordException class.
> > >>>
> > >>> Stanislav,
> > >>>
> > >>> Our users are also facing this error. Could we bump up this
> discussion?
> > >>>
> > >>> I think we can have a single exception type
> > >>> FaultyRecordException/RecordDeserialization exception to capture both
> > >>> the errors. We can add an additional enum field to differentiate the
> > >>> errors
> > >>> if required.
> > >>>
> > >>> Thanks,
> > >>> Kamal Chandraprakash
> > >>>
> > >>> On Wed, Apr 24, 2019 at 1:49 PM Kamal Chandraprakash <
> > >>> kamal.chandraprakash@gmail.com> wrote:
> > >>>
> > >>> > Stanislav,
> > >>> >
> > >>> > Any updates on this KIP? We have internal users who want to skip
> the
> > >>> > corrupted message while consuming the records.
> > >>> >
> > >>> >
> > >>> > On Fri, Oct 19, 2018 at 11:34 PM Matthias J. Sax <
> > >>> matthias@confluent.io>
> > >>> > wrote:
> > >>> >
> > >>> >> I am not 100% familiar with the details of the consumer code,
> > however
> > >>> I
> > >>> >> tend to disagree with:
> > >>> >>
> > >>> >> > There's no difference between the two cases -- if (and
only if)
> > the
> > >>> >> message is corrupt, it can't be deserialized.  If (and only
if) it
> > >>> can't be
> > >>> >> deserialized, it is corrupt.
> > >>> >>
> > >>> >> Assume that a user configures a JSON deserializer but a faulty
> > >>> upstream
> > >>> >> producer writes an Avro message. For this case, the message
is not
> > >>> >> corrupted, but still can't be deserialized. And I can imaging
that
> > >>> users
> > >>> >> want to handle both cases differently.
> > >>> >>
> > >>> >> Thus, I think it makes sense to have two different exceptions
> > >>> >> `RecordDeserializationException` and `CorruptedRecordException`
> that
> > >>> can
> > >>> >> both extend `FaultyRecordException` (don't like this name
too much
> > >>> >> honestly, but don't have a better idea for it anyway).
> > >>> >>
> > >>> >> Side remark. If we introduce class
> `RecordDeserializationException`
> > >>> and
> > >>> >> `CorruptedRecordException`, we can also add an interface that
both
> > >>> >> implement to return partition/offset information and let both
> extend
> > >>> >> `SerializationException` directly without an intermediate
class in
> > the
> > >>> >> exception hierarchy.
> > >>> >>
> > >>> >>
> > >>> >> -Matthias
> > >>> >>
> > >>> >> On 8/8/18 2:57 AM, Stanislav Kozlovski wrote:
> > >>> >> >> If you are inheriting from SerializationException,
your derived
> > >>> class
> > >>> >> > should also be a kind of serialization exception.  Not
something
> > >>> more
> > >>> >> > general.
> > >>> >> > Yeah, the reason for inheriting it would be for
> > >>> backwards-compatibility.
> > >>> >> >
> > >>> >> >> Hmm.  Can you think of any new scenarios that would
make Kafka
> > >>> force
> > >>> >> the
> > >>> >> > user need to skip a specific record?  Perhaps one scenario
is if
> > >>> records
> > >>> >> > are lost but we don't know how many.
> > >>> >> > Not on the spot, but I do wonder how likely a new scenario
is to
> > >>> >> surface in
> > >>> >> > the future and how we'd handle the exceptions' class
hierarchy
> > then.
> > >>> >> >
> > >>> >> >> Which offset were we planning to use in the
> > >>> >> > exception?
> > >>> >> > The offset of the record which caused the exception.
In the case
> > of
> > >>> >> > batches, we use the last offset of the batch. In both
cases,
> users
> > >>> >> should
> > >>> >> > have to seek +1 from the given offset. You can review
the PR to
> > >>> ensure
> > >>> >> its
> > >>> >> > accurate
> > >>> >> >
> > >>> >> >
> > >>> >> > If both of you prefer `RecordDeserializationException`,
we can
> go
> > >>> with
> > >>> >> > that. Please do confirm that is okay
> > >>> >> >
> > >>> >> > On Tue, Aug 7, 2018 at 11:35 PM Jason Gustafson <
> > jason@confluent.io
> > >>> >
> > >>> >> wrote:
> > >>> >> >
> > >>> >> >> One difference between the two cases is that we can't
generally
> > >>> trust
> > >>> >> the
> > >>> >> >> offset of a corrupt message. Which offset were we
planning to
> use
> > >>> in
> > >>> >> the
> > >>> >> >> exception? Maybe it should be either the fetch offset
or one
> plus
> > >>> the
> > >>> >> last
> > >>> >> >> consumed offset? I think I'm with Colin in preferring
> > >>> >> >> RecordDeserializationException for both cases if
possible. For
> > one
> > >>> >> thing,
> > >>> >> >> that makes the behavior consistent whether or not
`check.crs`
> is
> > >>> >> enabled.
> > >>> >> >>
> > >>> >> >> -Jason
> > >>> >> >>
> > >>> >> >> On Tue, Aug 7, 2018 at 11:17 AM, Colin McCabe <
> > cmccabe@apache.org>
> > >>> >> wrote:
> > >>> >> >>
> > >>> >> >>> Hi Stanislav,
> > >>> >> >>>
> > >>> >> >>> On Sat, Aug 4, 2018, at 10:44, Stanislav Kozlovski
wrote:
> > >>> >> >>>> Hey Colin,
> > >>> >> >>>>
> > >>> >> >>>> It may be a bit vague but keep in mind we
also raise the
> > >>> exception in
> > >>> >> >> the
> > >>> >> >>>> case where the record is corrupted.
> > >>> >> >>>> We discussed with Jason offline that message
corruption most
> > >>> often
> > >>> >> >>> prevents
> > >>> >> >>>> deserialization itself and that may be enough
of an argument
> to
> > >>> raise
> > >>> >> >>>> `RecordDeserializationException` in the case
of a corrupt
> > >>> record. I
> > >>> >> >>>> personally find that misleading.
> > >>> >> >>>
> > >>> >> >>> Hmm.  I think that by definition, corrupt records
are records
> > that
> > >>> >> can't
> > >>> >> >>> be deserialized.  There's no difference between
the two cases
> --
> > >>> if
> > >>> >> (and
> > >>> >> >>> only if) the message is corrupt, it can't be
deserialized.  If
> > >>> (and
> > >>> >> only
> > >>> >> >>> if) it can't be deserialized, it is corrupt.
> > >>> >> >>>
> > >>> >> >>>>
> > >>> >> >>>> In the end, I think it might be worth it
to have a bit of a
> > >>> >> >>>> wider-encompassing `FaultyRecordException`
(or even
> > >>> >> >>>> `UnconsumableRecordException`) which would
allow users to
> skip
> > >>> over
> > >>> >> the
> > >>> >> >>>> record.
> > >>> >> >>>
> > >>> >> >>> If you are inheriting from SerializationException,
your
> derived
> > >>> class
> > >>> >> >>> should also be a kind of serialization exception.
 Not
> something
> > >>> more
> > >>> >> >>> general.
> > >>> >> >>>
> > >>> >> >>>> We could then potentially have more specific
exceptions (e.g
> > >>> >> >>>> deserialization) inherit that but I'm not
sure if we should.
> > >>> >> >>>> A case for a more general exception which
provides access to
> > the
> > >>> >> >>>> partition/offset is future backwards-compatibility.
If there
> is
> > >>> ever
> > >>> >> a
> > >>> >> >>> new
> > >>> >> >>>> scenario that would make the user need to
skip a specific
> > record
> > >>> -
> > >>> >> >> there
> > >>> >> >>>> would already be such an exception and this
will provide some
> > >>> >> >>>> backward-compatibility with older clients.
> > >>> >> >>>
> > >>> >> >>> Hmm.  Can you think of any new scenarios that
would make Kafka
> > >>> force
> > >>> >> the
> > >>> >> >>> user need to skip a specific record?  Perhaps
one scenario is
> if
> > >>> >> records
> > >>> >> >>> are lost but we don't know how many.
> > >>> >> >>>
> > >>> >> >>> If we really want to have something more general,
perhaps we
> > need
> > >>> >> >>> something like LostDataException.  But in that
case, we can't
> > >>> inherit
> > >>> >> >> from
> > >>> >> >>> SerializationException, since lost data is more
general than
> > >>> >> >> serialization
> > >>> >> >>> issues.
> > >>> >> >>>
> > >>> >> >>> best,
> > >>> >> >>> Colin
> > >>> >> >>>
> > >>> >> >>>
> > >>> >> >>>>
> > >>> >> >>>> Best,
> > >>> >> >>>> Stanislav
> > >>> >> >>>>
> > >>> >> >>>> On Sat, Aug 4, 2018 at 12:23 AM Colin McCabe
<
> > cmccabe@apache.org
> > >>> >
> > >>> >> >> wrote:
> > >>> >> >>>>
> > >>> >> >>>>> Hi Stanislav,
> > >>> >> >>>>>
> > >>> >> >>>>> Thanks for the KIP.
> > >>> >> >>>>>
> > >>> >> >>>>> As as user, "FaultyRecordException" seems
a little vague.
> > >>> What's
> > >>> >> >>> faulty
> > >>> >> >>>>> about it?  Is it too big?  Does it not
match the schema of
> the
> > >>> other
> > >>> >> >>>>> records?  Was it not found?
> > >>> >> >>>>>
> > >>> >> >>>>> Of course, we know that the specific
problem is with
> > >>> [de]serializing
> > >>> >> >>> the
> > >>> >> >>>>> record, not with any of those those things.
 So we should
> > >>> choose a
> > >>> >> >> name
> > >>> >> >>>>> that reflects that serialization is the
problem.  How about
> > >>> >> >>>>> RecordSerializationException?
> > >>> >> >>>>>
> > >>> >> >>>>> best,
> > >>> >> >>>>> Colin
> > >>> >> >>>>>
> > >>> >> >>>>>
> > >>> >> >>>>> On Thu, Aug 2, 2018, at 15:11, Stanislav
Kozlovski wrote:
> > >>> >> >>>>>> Hi Jason and Ted,
> > >>> >> >>>>>>
> > >>> >> >>>>>> @Jason
> > >>> >> >>>>>> I did not oppose the idea as I'm
not too familiar with Java
> > >>> >> >>> conventions.
> > >>> >> >>>>> I
> > >>> >> >>>>>> agree it is a non-ideal way for the
user to catch the
> > >>> exception so
> > >>> >> >> I
> > >>> >> >>>>>> changed it back.
> > >>> >> >>>>>>
> > >>> >> >>>>>> I've updated the KIP and PR with
the latest changes. Now,
> > >>> there is
> > >>> >> >>> only
> > >>> >> >>>>> one
> > >>> >> >>>>>> public exception `FaultyRecordException`
which is thrown to
> > the
> > >>> >> >> user
> > >>> >> >>> in
> > >>> >> >>>>>> both scenarios (corrupt record and
deserialization error).
> > >>> >> >>>>>> Please take a look again.
> > >>> >> >>>>>>
> > >>> >> >>>>>> Best,
> > >>> >> >>>>>> Stanislav
> > >>> >> >>>>>>
> > >>> >> >>>>>> On Thu, Aug 2, 2018 at 5:25 PM Jason
Gustafson <
> > >>> jason@confluent.io
> > >>> >> >>>
> > >>> >> >>>>> wrote:
> > >>> >> >>>>>>
> > >>> >> >>>>>>> Hey Stanislav,
> > >>> >> >>>>>>>
> > >>> >> >>>>>>> I should have noticed it earlier
from your example, but I
> > just
> > >>> >> >>> realized
> > >>> >> >>>>>>> that interfaces don't mix well
with exceptions. There is
> no
> > >>> way
> > >>> >> >> to
> > >>> >> >>>>> catch
> > >>> >> >>>>>>> the interface type, which means
you have to depend on
> > >>> instanceof
> > >>> >> >>>>> checks,
> > >>> >> >>>>>>> which is not very conventional.
At the moment, we raise
> > >>> >> >>>>>>> SerializationException if there
is a problem parsing the
> > >>> error,
> > >>> >> >>> and we
> > >>> >> >>>>>>> raise KafkaException if the record
fails its CRC check.
> > Since
> > >>> >> >>>>>>> SerializationException extends
KafkaExeption, it seems
> like
> > we
> > >>> >> >> can
> > >>> >> >>>>> handle
> > >>> >> >>>>>>> both cases in a compatible way
by handling both cases
> with a
> > >>> >> >> single
> > >>> >> >>>>> type
> > >>> >> >>>>>>> that extends SerializationException.
Maybe something like
> > >>> >> >>>>>>> RecordDeserializationException?
> > >>> >> >>>>>>>
> > >>> >> >>>>>>> Thanks,
> > >>> >> >>>>>>> Jason
> > >>> >> >>>>>>>
> > >>> >> >>>>>>> On Thu, Aug 2, 2018 at 5:45 AM,
Ted Yu <
> yuzhihong@gmail.com
> > >
> > >>> >> >>> wrote:
> > >>> >> >>>>>>>
> > >>> >> >>>>>>>> +1
> > >>> >> >>>>>>>> -------- Original message
--------From: Stanislav
> > Kozlovski <
> > >>> >> >>>>>>>> stanislav@confluent.io>
Date: 8/2/18  2:41 AM
> (GMT-08:00)
> > >>> To:
> > >>> >> >>>>>>>> dev@kafka.apache.org Subject:
[VOTE] KIP-334 Include
> > >>> >> >> partitions
> > >>> >> >>> in
> > >>> >> >>>>>>>> exceptions raised during
consumer record
> > >>> >> >>> deserialization/validation
> > >>> >> >>>>>>>> Hey everybody,
> > >>> >> >>>>>>>>
> > >>> >> >>>>>>>> I'd like to start a vote
thread for KIP-334 Include
> > >>> partitions
> > >>> >> >> in
> > >>> >> >>>>>>>> exceptions raised during
consumer record
> > >>> >> >>> deserialization/validation
> > >>> >> >>>>>>>> <
> > >>> >> >>>>>>>
> > >>> >> >>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?
> > >>> >> >>> pageId=87297793
> > >>> >> >>>>>>>>>
> > >>> >> >>>>>>>>
> > >>> >> >>>>>>>> --
> > >>> >> >>>>>>>> Best,
> > >>> >> >>>>>>>> Stanislav
> > >>> >> >>>>>>>>
> > >>> >> >>>>>>>
> > >>> >> >>>>>>
> > >>> >> >>>>>>
> > >>> >> >>>>>> --
> > >>> >> >>>>>> Best,
> > >>> >> >>>>>> Stanislav
> > >>> >> >>>>>
> > >>> >> >>>>
> > >>> >> >>>>
> > >>> >> >>>> --
> > >>> >> >>>> Best,
> > >>> >> >>>> Stanislav
> > >>> >> >>>
> > >>> >> >>
> > >>> >> >
> > >>> >> >
> > >>> >>
> > >>> >>
> > >>>
> > >>
> > >>
> > >> --
> > >> Best,
> > >> Stanislav
> > >>
> > >
> > >
> > > --
> > > Best,
> > > Stanislav
> > >
> >
> >
> > --
> > Best,
> > Stanislav
> >
>


-- 
Best,
Stanislav

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message