nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Ward <ryan.wa...@gmail.com>
Subject Re: NiFi flow provides 0 output on large files
Date Fri, 25 Sep 2015 13:52:07 GMT
This is actually very easy to overlook and miss. Often times we change the
file expiration on a queue to simply empty the queue.

Could we add in a right click empty queue option, with an are you sure
prompt? Is there already a JIRA for this feature?

Thanks,
Ryan

On Fri, Sep 25, 2015 at 9:12 AM, Jeff <j.007ba7@gmail.com> wrote:

>
> That was a rookie mistake.
>
> Indeed the JSON_to_Avro queue was set to 5 sec.  Is there information in a
> log that states a flow file was expired?
>
> My ultimate goal is to put all of this data into a Confluent Kafka topic,
> taking advantage of the schema registry. I do not believe the current
> PutToKafka provides the ability to use this registry correct?   I’m curious
> if anyone is working on PutToConfluentKafka processor?
>
> Thanks for your help.
>
> Jeff
>
> On Sep 25, 2015, at 7:52 AM, Matt Gilman <matt.c.gilman@gmail.com> wrote:
>
> Jeff,
>
> What is the expiration setting on your connections? The little clock icon
> indicates that they are configured to automatically expire flowfiles of a
> certain age.
>
> Matt
>
> On Fri, Sep 25, 2015 at 8:50 AM, Jeff <j.007ba7@gmail.com> wrote:
>
>>
>> Hi Aldrin,
>>
>> After the DDA_Processor
>>
>> The below image shows that the GetFile Processed 174.6 MB and the
>> DDA_Processor is working on 1 file (the 1 in the upper right of the
>> DDA_Processor box)
>>
>> <unknown.gif>
>>
>> The below image shows that the DDA_Processor is complete but data did not
>> make it to ConvertJSONtoAvro.  No errors are being generated.
>> DDA_Processor takes fixed width data and converts it to JSON.
>>
>> <unknown.gif>
>>
>> Thanks
>>
>>
>> On Sep 25, 2015, at 7:30 AM, Aldrin Piri <aldrinpiri@gmail.com> wrote:
>>
>> Jeff,
>>
>> With regards to:
>>
>> "Anything over, the GetFile and DDA_Processor shows data movement but
>> the no other downstream processor shows movement."
>>
>> Are you referencing downstream processors starting immediately after the
>> DDA_Processor (ConvertJsonToAvro) or starting immediately after the
>> ConvertJsonToAvro processor?
>>
>> In the case of starting immediately after the DDA Processor, as it is a
>> custom processor, we would need some additional information as to how this
>> processor is behaving.  In the case of the second condition, if you have
>> some additional context as to the format of the data that is problematic to
>> what you are seeing (the effective "schema" of the JSON) would be helpful
>> in tracking down the issue.
>>
>> Thanks!
>> Aldrin
>>
>> On Fri, Sep 25, 2015 at 8:22 AM, Jeff <j.007ba7@gmail.com> wrote:
>>
>>> Hi Adam,
>>>
>>>
>>> I have a flow that does the following;
>>>
>>> GetFile > DDA_Processor > ConvertJSONToAvro > UpdateAttribute > PutFile
>>>
>>> My source file has 182897 rows at 1001 bytes per row.  If I do any
>>> number of rows under ~15000 an output file is created.  Anything over, the
>>> GetFile and DDA_Processor shows data movement but the no other downstream
>>> processor shows movement.
>>>
>>> I confirmed that it is not a data problem by processing a 10,000 row
>>> file successfully, then concatenating 10,000 rows into one file twice.
>>>
>>> Thanks for your insight.
>>>
>>> Jeff
>>> <Mail Attachment.gif>
>>>
>>>
>>> On Sep 24, 2015, at 8:40 PM, Aldrin Piri <aldrinpiri@gmail.com> wrote:
>>>
>>> Jeff,
>>>
>>> This seems to be a bit different as the processor is showing data as
>>> having been written and there is a listing of one FlowFile of 381 MB being
>>> transferred out from the processor.  Could you provide additional
>>> information as to how data is not being sent out in the manner
>>> anticipated?  If you can track the issue down more, let us know.  May be
>>> helpful to create another message to help us track the issues separately as
>>> we work through them.
>>>
>>> Thanks!
>>>
>>> Adam,
>>>
>>> Found a sizable JSON file to work against and have been doing some
>>> initial exploration.  With the large files, it certainly is a nontrivial
>>> process.  At cursory inspection, a good portion of processing seems to be
>>> spent on validation.  There are some ways to tweak the strictness of this
>>> with the supporting library, but will have to dive in a bit more.
>>>
>>>
>>>
>>> On Thu, Sep 24, 2015 at 8:14 PM, Jeff <j.007ba7@gmail.com> wrote:
>>>
>>>>
>>>>
>>>>
>>>> I’m having a very similar problem.  The process picks up the file, a
>>>> custom processor does it’s thing but no data is sent out.
>>>>
>>>> <unknown.gif>
>>>>
>>>>
>>>>
>>>
>>
>>
>
>

Mime
View raw message