nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Bende <bbe...@gmail.com>
Subject Re: Splunk Processor - Re-play
Date Tue, 14 Nov 2017 14:48:38 GMT
"Provided" will fetch a specific time interval repeatedly, so what you
are seeing is expected behavior... you would have to stop the
processor and change it to "managed" after that.

Pierre provided a suggestion of having an "initial start time" that
would go with the "managed" strategy so that you could have a more
specific start time instead of beginning or current.

Until that is implemented then you can't do what you are describing.
Please feel free to create a JIRA for this enhancement.

-Bryan


On Tue, Nov 14, 2017 at 9:41 AM, Sivakumar, S <sivakumar_s@intuit.com> wrote:
> Hi,
> Thx for your answer. I have one more question, if I use "provided" as
> strategy with earliest and latest time range , I am getting duplicate
> records as many times, how to handle this? And also I have requirement that
> once we process the those records by provided time range, post that it
> should continue as current strategy.
>
> Siva
>
> On Nov 14, 2017, at 7:34 PM, Bryan Bende <bbende@gmail.com> wrote:
>
> Pierre is correct...
>
> Currently you can reset the state back to the beginning by right-clicking on
> the processor and selecting View State and then Clear State.
>
> From there you could use "Managed from Beginning" to start over from the
> beginning", but there is no way to start at a specific point in time, only
> beginning or current time.
>
> On Tue, Nov 14, 2017 at 3:33 AM, Pierre Villard
> <pierre.villard.fr@gmail.com> wrote:
>>
>> Hi Siva,
>>
>> The processor is storing a "state" in the state management back-end of
>> NiFi (Zookeeper usually). There is no way for you to edit this value.
>> However, some processors expose a property allowing you to manually set the
>> initial value (Example GenerateTableFetch [1]) that the processor will use
>> when it is started. I don't know Splunk processors but I believe this is
>> what you'd like. If yes, I can only suggest you to submit a JIRA asking for
>> this feature [2].
>>
>> [1]
>> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.4.0/org.apache.nifi.processors.standard.GenerateTableFetch/index.html
>> [2] https://issues.apache.org/jira/projects/NIFI (you need to register
>> first)
>>
>> Pierre
>>
>> 2017-11-14 3:44 GMT+01:00 Sivakumar, S <sivakumar_s@intuit.com>:
>>>
>>> Hello,
>>>
>>> Even I provide Earliest Time and Latest Time with “Provided” strategy, it
>>> is causing data duplicate in the system. The same number of records are
>>> repeated as many times, when the workflow runs.
>>>
>>>
>>>
>>> My question is, even if specify the "Managed" time strategy, how to go
>>> back to certain time ranges (basically the re-pulling the same records which
>>> already pulled), by changing the stored value somewhere in the system where
>>> processor refers to that. Could be from persistent provenance repo. How to
>>> tweak the time range values from those repo?
>>>
>>>
>>>
>>> -Siva
>>>
>>>
>>>
>>> From: Bryan Bende <bbende@gmail.com>
>>> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
>>> Date: Monday, November 13, 2017 at 11:57 PM
>>> To: "users@nifi.apache.org" <users@nifi.apache.org>
>>> Cc: "Bharani, Manish" <Manish_Bharani@intuit.com>
>>> Subject: Re: Splunk Processor - Re-play
>>>
>>>
>>>
>>> Hello,
>>>
>>> If you want to specify Earliest Time and Latest Time, then you need to
>>> change Time Range Strategy to 'Provided".
>>>
>>> The "Managed" time ranger strategies are meant to let the processor
>>> calculate the time ranges for you on each execution and you can not specify
>>> time ranges when using those strategies.
>>>
>>> -Bryan
>>>
>>>
>>>
>>> On Mon, Nov 13, 2017 at 9:03 AM, Sivakumar, S <sivakumar_s@intuit.com>
>>> wrote:
>>>
>>> Hi Folks,
>>>
>>> I am newbie to nifi tool. I am using GetSplunk 1.4.0 processor to pull
>>> data from Splunk. Somewhat I have managed pulled the data for T-3, but I
>>> want to re-play and again want to pull data with some more additional
>>> transformation added in the splunk query.
>>>
>>>
>>>
>>> I have below two problems
>>>
>>>
>>>
>>> 1. No data is pulled in the Flow control.
>>>
>>> 2. If I change the “Time Range Strategy “
>>>
>>>                 a. Provided, the SAME data is pulled as many number of
>>> times till the Flow control runs,
>>>
>>>                 b. Managed from Beginning, it is pulled huge volume of
>>> data.
>>>
>>>
>>>
>>> Please advise me how to replay the flow control from where I want and
>>> continue from that point onwards
>>>
>>>
>>>
>>>
>>>
>>> <image001.png>
>>>
>>>
>>>
>>> -Siva
>>>
>>>
>>
>>
>

Mime
View raw message