Return-Path: X-Original-To: apmail-nifi-users-archive@minotaur.apache.org Delivered-To: apmail-nifi-users-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 81BA418AEF for ; Wed, 24 Feb 2016 14:55:10 +0000 (UTC) Received: (qmail 55152 invoked by uid 500); 24 Feb 2016 14:55:10 -0000 Delivered-To: apmail-nifi-users-archive@nifi.apache.org Received: (qmail 55121 invoked by uid 500); 24 Feb 2016 14:55:10 -0000 Mailing-List: contact users-help@nifi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@nifi.apache.org Delivered-To: mailing list users@nifi.apache.org Received: (qmail 55111 invoked by uid 99); 24 Feb 2016 14:55:10 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Feb 2016 14:55:10 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id E2E141804EF for ; Wed, 24 Feb 2016 14:55:09 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.429 X-Spam-Level: * X-Spam-Status: No, score=1.429 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id Frn13yTx97Uz for ; Wed, 24 Feb 2016 14:55:08 +0000 (UTC) Received: from mail-wm0-f54.google.com (mail-wm0-f54.google.com [74.125.82.54]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 6EF0C5FB4C for ; Wed, 24 Feb 2016 14:55:08 +0000 (UTC) Received: by mail-wm0-f54.google.com with SMTP id g62so33589040wme.0 for ; Wed, 24 Feb 2016 06:55:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=SdUAAQWWnm9V/JzB1UyPJqYTXBlW4/DluUWrpTh/4u8=; b=VRtyco5F7CN81M8SbhSKph/o8nGS3EFYJj4O3Gx9MtIXJH+vi5jEeEjs9xhZ8zHs8s qabOc22VL+Bp7IWFWnqUBvw21sPulNrZDwpO+97qnAyDIXvFugLJ/YZ1rQ9KegynhMs+ K+DRbk5KkRfPMJu1GQh+MQtF0JDZxqLy0JboiBtsE7K0CvrV7s0t+wQDtUfOVMLMeRcw pYdkaaKS5avAlIrSJOKvRMuSmJzfRkpNADynnVbo2G0JOczfUOBZhZg1wmRpWKRkeq/y wCKy9+w32JOjRj/1lS75GdTiUMSctdk+zlbFRPwUrqHahzla0W0+e0jes2MKRBHd1L1G jF/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=SdUAAQWWnm9V/JzB1UyPJqYTXBlW4/DluUWrpTh/4u8=; b=lMBdBAjhEO7YUQG2/8fgGHDCiz7c/m7iMVr2fQKJk6wMKKxEvw24krS9hzJTnsB3sC RevgbMItUXnMssmYHqW0Zjp3GznUVEZeGpfOyqvoRg8+7FEPHk60LXX1lT6D1twrRXez UL+ls7D09XqL6wGY0t2drHetslxxVPs84B1Qx3x+roKemYt8mYCOBt/oTOOLRZoGnXvG tZlJew2QDWOgZTDCshuEPabODeSfwrYUGZUmSgIRluZtTlc7zJQHvCUQnpHr52z3jYi3 HrM7wcGEyDXBq1VOntebUUQMrNaYavSZOPUY0y5x7Rg7jcGQM5OYB5+OPTpY6ddXki+2 +Hwg== X-Gm-Message-State: AG10YOREcx/sSsDjj1wtzxy0XEc6OKIC2E6tX8JGnNPxQb763685M6aJWDjzKSRid1v6IKx8Xb3WuZ4MUFrx3g== MIME-Version: 1.0 X-Received: by 10.28.215.16 with SMTP id o16mr23609483wmg.57.1456325706912; Wed, 24 Feb 2016 06:55:06 -0800 (PST) Received: by 10.27.219.129 with HTTP; Wed, 24 Feb 2016 06:55:06 -0800 (PST) In-Reply-To: References: Date: Wed, 24 Feb 2016 09:55:06 -0500 Message-ID: Subject: Re: ExtractText processor From: Matthew Clarke To: users@nifi.apache.org Content-Type: multipart/alternative; boundary=001a1145ba2c7019c1052c8540e4 --001a1145ba2c7019c1052c8540e4 Content-Type: text/plain; charset=UTF-8 Sudeep, You need to be cautious when extracted the entire contents of a file in to an attribute. Attributes are stored in JVM memory. Having exceptionally large attributes will consume considerable amounts of that memory. To use the extractText processor to grab the entire content, you first need to set/adjust teh following properties: - Maximum Buffer Size <-- default is 1 MB but needs to be large enough to accommodate the entire file. - Maximum Capture Group Size <-- in your case since your capture group will be the entire file, this also must be large enough to handle entire content. if set to low and characters beyond they set file will be truncated. - Enable DOTALL Mode <-- needs to be set to true so that line returns are matched by your capture group as well. - Include Capture Group 0 <-- you should set this to False to lessen your JVM memory footprint here. - Finally you need to add a "New property" which will contain your capture group - for example: property name: MyContent value: (.*) The above value is a Java regular expression contained in a capture group. Matt On Wed, Feb 24, 2016 at 9:22 AM, sudeep mishra wrote: > Hi, > > Can someone please guide how to use the ExtractText processor to add > entire flowfile content to an attribute? > > > Thanks & Regards, > > Sudeep > --001a1145ba2c7019c1052c8540e4 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Sudeep,
=C2=A0 =C2=A0 =C2=A0 =C2=A0You need to be caut= ious when extracted the entire contents of a file in to an attribute.=C2=A0= Attributes are stored in JVM memory.=C2=A0 Having exceptionally large attr= ibutes will consume considerable amounts of that memory. To use the extract= Text processor to grab the entire content, you first need to set/adjust teh= following properties:

- Maximum Buffer Size =C2=A0 <-- default i= s 1 MB but needs to be large enough to accommodate the entire file.
- Ma= ximum Capture Group Size =C2=A0<-- in your case since your capture group= will be the entire file, this also must be large enough to handle entire c= ontent. =C2=A0if set to low and characters beyond they set file will be tru= ncated.
- Enable DOTALL Mode =C2=A0 <-- needs to be set to true so th= at line returns are matched by your capture group as well.
- Include Cap= ture Group 0 =C2=A0<-- you should set this to False to lessen your JVM m= emory footprint here.
- Finally you need to add a "New property&quo= t; which will contain your capture group
=C2=A0 =C2=A0 =C2=A0- for examp= le:
property name: MyContent
value: (.*)=C2=A0

The above value= is a Java regular expression contained in a capture group.

<= /div>
Matt

On Wed, Feb 24, 2016 at 9:22 AM, sudeep mishra &= lt;sudeepshek= harm@gmail.com> wrote:
Hi,

Can someone please guide how to use the= ExtractText processor to add entire flowfile content to an attribute?

=C2=A0
Thanks & Re= gards,

Sudeep=C2=A0

--001a1145ba2c7019c1052c8540e4--