nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Clarke <matt.clarke....@gmail.com>
Subject Re: ExtractText processor
Date Wed, 24 Feb 2016 14:55:06 GMT
Sudeep,
       You need to be cautious when extracted the entire contents of a file
in to an attribute.  Attributes are stored in JVM memory.  Having
exceptionally large attributes will consume considerable amounts of that
memory. To use the extractText processor to grab the entire content, you
first need to set/adjust teh following properties:

- Maximum Buffer Size   <-- default is 1 MB but needs to be large enough to
accommodate the entire file.
- Maximum Capture Group Size  <-- in your case since your capture group
will be the entire file, this also must be large enough to handle entire
content.  if set to low and characters beyond they set file will be
truncated.
- Enable DOTALL Mode   <-- needs to be set to true so that line returns are
matched by your capture group as well.
- Include Capture Group 0  <-- you should set this to False to lessen your
JVM memory footprint here.
- Finally you need to add a "New property" which will contain your capture
group
     - for example:
property name: MyContent
value: (.*)

The above value is a Java regular expression contained in a capture group.

Matt

On Wed, Feb 24, 2016 at 9:22 AM, sudeep mishra <sudeepshekharm@gmail.com>
wrote:

> Hi,
>
> Can someone please guide how to use the ExtractText processor to add
> entire flowfile content to an attribute?
>
>
> Thanks & Regards,
>
> Sudeep
>

Mime
View raw message