nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Witt <joe.w...@gmail.com>
Subject Re: Drop event disordering
Date Mon, 11 Dec 2017 13:58:06 GMT
Omer

It can certainly be looked at in time for 1.5 but based on recent
progress/discussion 1.5 may well kick off rather soon.  It is
something someone would need to take up and that someone need not be a
committer at this point.  Your analysis of the different provenanace
repositories having no real impact is helpful though as it might
suggest the issue lies above during the generation of the events so
that should help narrow it down.

Thanks
Joe

On Mon, Dec 11, 2017 at 8:51 AM, Omer Hadari <hadari.omer@gmail.com> wrote:
> Hi, we have looked into it some more and tried using
> WriteAheadProvenanceRepository, to no real avail. Sorry for the ping, but do
> you have any other ideas? We don’t mind pursuing them and this problem is
> critical for us. Any chance this could be looked at for 1.5?
>
> Thanks!
>
> On Fri, 24 Nov 2017 at 13:19 Omer Hadari <hadari.omer@gmail.com> wrote:
>>
>> Opened an issue
>> NIFI-4638
>>
>> On Wed, 22 Nov 2017 at 20:39 Omer Hadari <hadari.omer@gmail.com> wrote:
>>>
>>> Yes we were thinking in that direction as well, that’s why I mentioned
>>> the 1ms part. I do not know how events are assigned an ordinal though, so
>>> it’s unclear to me whether the disordering is constant but is usually
>>> “hidden” since there is no rollover, or maybe there is some kind of a race
>>> condition. You’ll probably be able to answer these questions quicker than
>>> me, I’ll open a jira as soon as I get home later today.
>>>
>>> Thanks again!
>>>
>>> On Wed, 22 Nov 2017 at 20:34 Mark Payne <markap14@hotmail.com> wrote:
>>>>
>>>> Omer,
>>>>
>>>> Yes, I think that is sufficient. I think the issue is that the framework
>>>> is creating both the
>>>> ATTRIBUTES_MODIFIED and DROP events, and the generation of these objects
>>>> is
>>>> very fast. But if the timestamp happens to 'rollover' from millisecond 1
>>>> to millisecond 2,
>>>> for example, those events get different timestamps. So I think it's just
>>>> a timing thing that
>>>> will be somewhat difficult to reproduce reliably. But just a description
>>>> of the behavior that
>>>> you're experiencing should be fine.
>>>>
>>>> Thanks
>>>> -Mark
>>>>
>>>> On Nov 22, 2017, at 1:04 PM, Omer Hadari <hadari.omer@gmail.com> wrote:
>>>>
>>>> I’ll be glad to open a jira, though the problem is hardly coherent imo,
>>>> what would you like to see there? Simply “Disordering of drop events”
and
>>>> the explanation I have here? Sadly I cannot provide a concrete example since
>>>> the problem does not reproduce.
>>>>
>>>> On Wed, 22 Nov 2017 at 18:23 Joe Witt <joe.witt@gmail.com> wrote:
>>>>>
>>>>> also - awesome find!  And glad you're at such a level with provenance
>>>>> data to catch that.  Thanks Omer!
>>>>>
>>>>> On Wed, Nov 22, 2017 at 11:21 AM, Mark Payne <markap14@hotmail.com>
>>>>> wrote:
>>>>> > Omer,
>>>>> >
>>>>> > This is likely an issue related to the order in which we generate
>>>>> > those events in the framework.
>>>>> > Do you mind filing a JIRA?
>>>>> >
>>>>> > Thanks
>>>>> > -Mark
>>>>> >
>>>>> >
>>>>> >> On Nov 22, 2017, at 10:51 AM, Omer Hadari <hadari.omer@gmail.com>
>>>>> >> wrote:
>>>>> >>
>>>>> >> Hi!
>>>>> >> We’ve been using NiFi for a while now, and we save all provenance
>>>>> >> events for logging purposes and such. We encountered an issue
while looking
>>>>> >> at lineages of some flow files, which showed drop events as
if they happened
>>>>> >> before another event, that in fact preceded it (and indeed has
a lower event
>>>>> >> ordinal).
>>>>> >>
>>>>> >> For example in a split json processor, the original FlowFile
is
>>>>> >> dropped after all splits happen and are assigned fragment counts,
but still
>>>>> >> the timestamp of the drop event is earlier than the timestamp
of the
>>>>> >> attributes modified event. That causes the graph to look as
if the
>>>>> >> attributes modified event comes out of the drop event, which
doesn’t really
>>>>> >> make sense to us (should it?). It’s probably worth noting
that the drop
>>>>> >> event ordinal is higher than the attributes modified event ordinal.
Also we
>>>>> >> noticed that
>>>>> >> 1. This only happens every once per a few thousand events.
>>>>> >> 2. This does not reproduce by replaying.
>>>>> >> 3. The drop event’s timestamp is earlier by 1ms in the cases
we
>>>>> >> encountered, and the ordinal is always larger by one.
>>>>> >>
>>>>> >> This might be an error with the split json processor or a more
>>>>> >> general one. We’d love any clues or corrections to misconceptions
we might
>>>>> >> have (maybe this is not a problem and drop events can precede
other events?)
>>>>> >>
>>>>> >> Thank you!
>>>>> >
>>>>
>>>>
>

Mime
View raw message