nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Bende <bbe...@gmail.com>
Subject Re: Publishing Kafka Topics via Apache Nifi cluster.
Date Thu, 31 May 2018 13:43:20 GMT
Hello,

If I'm understanding the situation correctly, you want ordering within
a key, but not necessarily total ordering across all your data?

I'm making this assumption since you said you have 9 partitions on
your Kafka topic and you are partitioning by key, so the data for each
key is in order per partition.

The list + fetch pattern with redistribution doesn't have a way to
control how the data is distributed, it is just round-robin and you
can control the batch size, but you can't partition the data to nodes
based on a key.

There is an EnforceOrder processor [1] which was made to help with
this kind of scenario, I believe specifically for CDC scenarios where
the event log has to be processed in order. I haven't used it myself
so maybe others can help here, but I believe you would use your "key"
as the "Group Identifier" and then somehow you need to get an integer
value on each flow file that represents the order within the group. So
for example your A-event flow file would need some kind of attribute
like "order = 1" and then the B-event flow file would need an
attribute like "order = 2". You might be able to assign this order
using an UpdateAttribute processor right after the ListSFTP, but you
have to do it per key somehow.

Another option is to just run the whole flow on primary node without
doing the site-to-site redistribution, but then you lose out on
parallel processing, and even on a single node I believe there are
cases where ordering is not guaranteed.

Thanks,

Bryan

[1] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.6.0/org.apache.nifi.processors.standard.EnforceOrder/index.html


On Wed, May 30, 2018 at 4:35 PM, rey26 <reyaan26@gmail.com> wrote:
> Hello Team,
>
> We have a apache Nifi cluster with 3 nodes and 3 nodes kafka cluster.We are
> receiving some files which has transactions in orders.(A-type first and than
> B-type)
> These events are in order but may come is different files.For example
> A-event for id 111 can be present in file 1 and B-event can come in
> immedaite file2 [B will always come after
> A-type for any ID].We want data need to be puslished in the same order as it
> is received.
>
> We developed a flow using ListSFTP+FecthFTP+publishkafka combination in
> order ,have also done partitioning on kafka topic[9 partitions] on the
> basics of a key column
> and same key is used in Publish Kafka Processor.
>
> Al the events are published to the same partition but are going out of order
> but within the partition are out of order.
> Example B-type events are coming before A-Type in kafka topic TEST.
>
> Now i have some queries regarding the above
>
> What i understood is that since the ListSFTP+FecthFTP improves load
> balancing but does it ensures ordering?
> File1 may go to Node1 and File2 may go to Node2 , and Node 2 can publish the
> record to the same partition on kafka before Node1?
> Is there any way to gaurantee load order of files in Apache Nifi in cluster
> Mode keeping perfomance in mind.?
>
> Since each task in PublishKafka processor is one publisher , if we run the
> publish kafka on only primary node and pass only one broker-id does it will
> do the trick?
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/

Mime
View raw message