nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koji Kawamura <ijokaruma...@gmail.com>
Subject Re: NIFI-4715 : ListS3 list duplicate files when incoming file throughput to S3 is high
Date Tue, 26 Dec 2017 09:54:25 GMT
Hi Milan,

Thanks for your contribution! I reviewed the PR and posted a comment there.
Would you check that?

Koji

On Sat, Dec 23, 2017 at 7:15 AM, Milan Das <mdas@interset.com> wrote:

> I have logged a defect in NIFI. ListS3 is generation duplicate flows  when
> S3 throughput is high.
>
>
>
> Root cause is:
> When the file gets uploaded to S3 simultaneously when List S3 is in
> progress.
> onTrigger--> maxTimestamp is initiated as 0L.
> This is clearing keys as per the code below
>
> When lastModifiedTime on S3 object is same as currentTimestamp for the
> listed key it should be skipped. As the key is cleared, it is loading the
> same file again.
> I think fix should be to initiate the maxTimestamp with currentTimestamp
> not 0L.
>
>
>
>
>
>
>
> https://issues.apache.org/jira/browse/ NIFI-4715
> <https://issues.apache.org/jira/browse/NIFI-4715>
>
>
>
> The fix I did already seems ok and working for us.
>
> long maxTimestamp = currentTimestamp;
>
>
>
> Wanted to check thought from other experts or of there is any other know
> fix .
>
>
>
>
>
> Regards,
>
>
>
> [image: graph]
>
> *Milan Das*
> Sr. System Architect
>
> email: mdas@interset.com
> mobile: +1 678 216 5660 <(678)%20216-5660>
>
> [image: dIn icon] <https://www.linkedin.com/in/milandas/>
>
> www.interset.com
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message