nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koji Kawamura <ijokaruma...@gmail.com>
Subject Re: processors ListFile/ListSFTP do not store milliseconds in timestamp
Date Tue, 04 Jul 2017 13:21:22 GMT
Hi Roman, Joe S, and others,

I've finally made some progress on this ListXXX processor issues.

Now I confirmed ListFile can list 100_000 files without missing anything:
for i in {1..100000}; do touch ./test_$i; done
works fine!! (it requires both NIFI-4069 and NIFI-3332)

1. ListFile can miss files with filesystems those do not provide
timestamps in milliseconds precision (NIFI-4069)
#1915 is ready for review. This PR focuses only on solving timestamp
precision issue.

2. ListFile can miss files having the same timestamp same as the
previously processed latest timestamp (NIFI-3332)
#1975 is also ready for review.

3. ListFile can not pickup files whose timestamp is older than the
previously processed latest timestamp (NIFI-2383)
I haven't done anything with this.

With #1 and #2, ListXXX are reliable enough and at a good balance in
terms of reliability and efficiency. NIFI-3332 brings back storing
file identifiers into state, but only for those having the latest
timestamp. Previously, it stores whole identifiers it processed.

Could anyone review PRs above?

Thanks
Koji

On Wed, Jun 21, 2017 at 2:01 PM, Koji Kawamura <ijokarumawak@gmail.com> wrote:
> Thanks Joe, I agree with you on the idea to make ListXXX as reliable
> as possible. If it's done, I'm also interested in providing different
> means using watch APIs to cover use-cases that ListXXX can't (by
> timestamps).
>
> Roman, thanks for testing the change.
> Test 1 and 2 results are expected.
> Test 3 ... this might have been affected by the issue reported by
> NIFI-3332 (files having the same timestamp processed at previous
> cycle). I'll take a look if there's anything we can do.
>
>> 2. Still do not see milliseconds, however my ext4 file system show modify date in
nanoseconds
>
> Roman, would you try creating a simple Java program to see if the
> issue resides in NiFi codebase, or native code for your environment?
> There is a similar issue reported in Stackoverflow:
> https://stackoverflow.com/questions/24804618/get-file-mtime-with-millisecond-resolution-from-java
>
> If the simple program can return timestamp in milliseconds, we should
> fix something in NiFi.
>
> I really appreciate your feedback! Thanks!
> Koji
>
> On Tue, Jun 20, 2017 at 9:17 PM, Roman <ramon9869@gmail.com> wrote:
>> Hello Koji,
>>
>> Thanks for NIFI-4069 (not NIFI-4096 =))
>>
>> I tested your PR in several ways on version: From a0f2834 on branch
>> nifi-4069
>>
>> Test 1:
>> 1. set Target System Timestamp Precision: Auto Detect
>> 2. start ListFile
>> 3. start script for i in {1..10000}; do touch ./test_$i; done
>>
>> Result: no miss files
>>
>>
>> Test 2:
>> 1. set Target System Timestamp Precision: Milliseconds
>> 2. start ListFile
>> 3. start script for i in {1..10000}; do touch ./test_$i; done
>>
>> Result: there are missing files
>>
>>
>> Test 3 and 4 (100k files):
>> 1. set Target System Timestamp Precision: Auto Detect
>> 2. start ListFile
>> 3. start script for i in {1..100000}; do touch ./test_$i; done
>>
>> Result: missing 68 and 40 files
>>
>>
>> In all tests listing.timestamp and processed.timestamp still not have
>> milliseconds
>>
>>
>>
>> Summary:
>> 1. Now much better than it was. Thanks Koji for good job!
>> 2. Still do not see milliseconds, however my ext4 file system show modify
>> date in nanoseconds
>>
>>
>> Koji Kawamura-2 wrote
>>> Hi Roman and all,
>>>
>>> As I investigated further on ListFile processor, I found those are two
>>> different issues.
>>> Also I found another JIRA related to ListFile. Currently there seem to
>>> be three issues:
>>>
>>> 1. ListFile can miss files with filesystems those do not provide
>>> timestamps in milliseconds precision (NIFI-4096)
>>> 2. ListFile can miss files having the same timestamp same as the
>>> previously processed latest timestamp (NIFI-3332)
>>> 3. ListFile can not pickup files whose timestamp is older than the
>>> previously processed latest timestamp (NIFI-2383)
>>>
>>> # NIFI-4096
>>> I created JIRA NIFI-4096 to address issue#1 above, by adding
>>> deterministic logic to detect target filesystem timestamp precision.
>>> With NIFI-4096, ListFile can list whole 10,000 files created by the
>>> command you shared before without missing anything:
>>>
>>> ```
>>> for i in {1..10000}; do touch ./test_$i; done
>>> ```
>>>
>>> The PR is ready for review. I appreciate if you can test the fix with
>>> your use case.
>>>
>>> Additionally, I refactored variable names in AbstractListProcessor to
>>> explain purpose and timestamp unit better. I hope it makes the code
>>> more readable and maintainable.
>>>
>>> # NIFI-3332
>>> I'm thinking about adding a processor property to specify whether
>>> track the listed filenames with the latest processed timestamp.
>>> Although it will be less efficient, it'd be good for some use cases.
>>>
>>> # NIFI-2383
>>> This is the most difficult case to handle right with only timestamp.
>>> We need different processor which can use watch API..
>>>
>>> Any comment would be appreciated.
>>>
>>> Thanks,
>>> Koji
>>>
>>> On Tue, Jun 6, 2017 at 9:18 PM, Koji Kawamura &lt;
>>
>>> ijokarumawak@
>>
>>> &gt; wrote:
>>>> Hi Roman,
>>>>
>>>> I think NIFI-3332 is probably related as I can see timestamps in logs
>>>> don't have milliseconds.
>>>>
>>>> I've been considering how we can support all corner cases with minimal
>>>> state to persist, and make it works even if the filesystem only
>>>> provide last modified timestamp in seconds precision.
>>>> Changing code and testing locally, but not ready for send a PR yet,
>>>> and I am not fully confident on how to fix.
>>>>
>>>> Any suggestion or insight would be appreciated to make these ListXXXX
>>>> processor better.
>>>>
>>>> Thanks,
>>>> Koji
>>>>
>>>> On Tue, Jun 6, 2017 at 8:54 PM, Roman &lt;
>>
>>> ramon9869@
>>
>>> &gt; wrote:
>>>>> Hi there,
>>>>>
>>>>> During digging into this issue, I found open issue in jira  NIFI-3332
>>>>> &lt;https://issues.apache.org/jira/browse/NIFI-3332&gt;  . Can
it be
>>>>> related to my
>>>>> situation with missed milliseconds?
>>>>>
>>>>> Thanks
>>>>> Roman
>>>>>
>>>>>
>>>>> Koji Kawamura-2 wrote
>>>>>> Hello Roman,
>>>>>>
>>>>>> It seems the resolution of last modified timestamp depends on the
file
>>>>>> system implementation.
>>>>>> https://stackoverflow.com/questions/3805201/how-to-get-ubuntu-file-timestamp-in-millisecond
>>>>>>
>>>>>> I reproduced the same behavior on OS X, which uses HFS that has the
>>>>>> same limitation of resolution in seconds.
>>>>>> https://stackoverflow.com/questions/18403588/how-to-return-millisecond-information-for-file-access-on-mac-os-x-in-java
>>>>>>
>>>>>> Which file system are you using on your Ubuntu? If it is ext3, then
>>>>>> changing it to ext4 may address the issue.
>>>>>>
>>>>>> Thanks,
>>>>>> Koji
>>>>>>
>>>>>> On Thu, Jun 1, 2017 at 1:25 AM, Roman &lt;
>>>>>
>>>>>> ramon9869@
>>>>>
>>>>>> &gt; wrote:
>>>>>>> Hi there, i need help.
>>>>>>>
>>>>>>> We prepare high load project and tested this processors. All
time see
>>>>>>> listing.timestamp and processed.timestamp keys without milliseconds
>>>>>>> (xxxxxxxxxx000). In this way, if generate several files in one
second,
>>>>>>> not
>>>>>>> all files will be listened.
>>>>>>>
>>>>>>>
>>>>>>> Test:
>>>>>>> 1. start processor ListFile/ListSFTP
>>>>>>> 2. generate 10000 zero size files. my command:  for i in {1..10000};
>>>>>>> do
>>>>>>> touch ./test_$i; done
>>>>>>> 3. see processor stats: out 3952 (0 bytes)
>>>>>>>
>>>>>>>
>>>>>>> I'm somewhere wrong? Or is it a bug nifi/java/etc?
>>>>>>>
>>>>>>> Environment
>>>>>>>
>>>>>>> Ubuntu 14.04.5 LTS, x64, ext4 file system
>>>>>>> Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
>>>>>>> Nifi 1.2.0 From 3a605af, Tagged nifi-1.2.0-RC2
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Roman
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>> http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-tp16037.html
>>>>>>> Sent from the Apache NiFi Developer List mailing list archive
at
>>>>>>> Nabble.com.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-tp16037p16118.html
>>>>> Sent from the Apache NiFi Developer List mailing list archive at
>>>>> Nabble.com.
>>
>>
>>
>>
>>
>> --
>> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/processors-ListFile-ListSFTP-do-not-store-milliseconds-in-timestamp-tp16037p16221.html
>> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Mime
View raw message