Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 63FFF200B83 for ; Sat, 3 Sep 2016 04:08:11 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 62529160ACB; Sat, 3 Sep 2016 02:08:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 5A275160A8C for ; Sat, 3 Sep 2016 04:08:10 +0200 (CEST) Received: (qmail 10616 invoked by uid 500); 3 Sep 2016 02:08:09 -0000 Mailing-List: contact users-help@nifi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@nifi.apache.org Delivered-To: mailing list users@nifi.apache.org Received: (qmail 10605 invoked by uid 99); 3 Sep 2016 02:08:09 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 03 Sep 2016 02:08:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id EA44CC3B82 for ; Sat, 3 Sep 2016 02:08:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.198 X-Spam-Level: ** X-Spam-Status: No, score=2.198 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 7ezCDVOWgDJv for ; Sat, 3 Sep 2016 02:08:06 +0000 (UTC) Received: from mail-oi0-f43.google.com (mail-oi0-f43.google.com [209.85.218.43]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 367855F306 for ; Sat, 3 Sep 2016 02:08:05 +0000 (UTC) Received: by mail-oi0-f43.google.com with SMTP id m11so59395126oif.1 for ; Fri, 02 Sep 2016 19:08:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=6HzxQE1YTHbOqVn3fskeNYmfDGj7cdAnuaU52DiOVqI=; b=0+klmsrX4lNs40vode1EXbSui39O9ada6BFCEu4sqMX6LyidFbi1YeKxPi4t1btq30 08YCT5IiEsuREXFAHP6vQxHp64HqBXE9Ns/h5/PhlLInmEfqJ3UNTmio5OpstUWy1fQ3 wN27rDDcp36+kqsoAEsBlI3y/HUvKbsDHHYmGzWNEbQfzhKGTy5cpKCIAiJGD/uNnK11 WiQFdKKtqS2OWUnvmUwltTv/Tjg42569sm4nz202wETXPcczWq3WX81t//PztRrGJxgM BkM2k+5H8Lb0PkZGPhZmU0rDHrlySRR1Hfx7gVjAQ7HM2lENppVBbyZOWtKSDwrSoc8+ 4R1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=6HzxQE1YTHbOqVn3fskeNYmfDGj7cdAnuaU52DiOVqI=; b=bWGghIpOu1/33At5C668vnXXSV6/tgA2p20750qBJrniw3+QadkQ3pXSM5Se5NSu+8 0exeRHhjFpt6AauJzimhjrcd83w4XIHsf6A80z9biO+tnrF0E0TzF6HcYjTz84M2hEur QY85J4DbmSYRTyCTKQs0je6LgQczFgYG9kn3I19ZfBmI8nkSdSBmijfGHvVW4eKsoC4l DcZxv4vYXM9va5wXyKi2AombRZjFmkIpz5cVOQbnEDzED+HYbfy/Rf+/9iiGPKwzU3FV MCR9na6BlligjeWm5PMypNJcr+kMOdFOZsjWuy2UfRTVwDxgYbwbncKw4khIzvJRHyAF Lohw== X-Gm-Message-State: AE9vXwNU/cn3B6YK+BR72F00cLFuwEpsvSbDTDHRAkxR9rosKPilJzCorWDelERzpVFgf4qyylkTKWfVvAByRA== X-Received: by 10.157.41.132 with SMTP id n4mr23304890otb.22.1472868483868; Fri, 02 Sep 2016 19:08:03 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Gunjan Dave Date: Sat, 03 Sep 2016 02:07:53 +0000 Message-ID: Subject: Re: Processor to enrich attribute from external service To: users@nifi.apache.org Content-Type: multipart/alternative; boundary=001a113de87ac82db1053b90eae1 archived-at: Sat, 03 Sep 2016 02:08:11 -0000 --001a113de87ac82db1053b90eae1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable How i have handled this personally is to use wrap sql processors with handlehttprequest processor essentially making the db operation as a REsT webservice. Then you have the option of fetchhttp processor update appending the results in attribute instead of content, which is an option already available. With mongo db, you need not do this additional operation of wrapping as it has a REST interface so directly use that in http processor. On Sat, Sep 3, 2016, 4:28 AM Matt Burgess wrote: > Agreed. Additionally, if we want to get fancy, we can work with > incoming flow files based on MIME type (JSON, XML, CSV) and have a > "Path" property to a field in the document. Then the processor could > replace inline the value in the document with the lookup value. If XML > files are coming in, the Path is an XPath expression. Same for JSON > and JSONPath, and CSV could be a column index (0-based, e.g.). > > I have something very similar (not the lookup, but the "Path" thing > for multiple file types) coming soon as a Jira case / PR ;) If that > proves useful, I could move it into a util or base class or something. > > Regards, > Matt > > On Fri, Sep 2, 2016 at 6:47 PM, Manish Gupta 8 > wrote: > > I think the lookup processor should return data in a format that can be > > efficiently parsed/processed by NiFi expression language. For example = =E2=80=93 > > JSON. This would avoid using additional =E2=80=9CExtract=E2=80=9D type = processor. All the > > downstream processor can simply work with =E2=80=9CjsonPath=E2=80=9D fo= r additional > lookup > > inside the attribute. > > > > > > > > Regards, > > > > Manish > > > > > > > > From: Matt Burgess [mailto:mattyb149@gmail.com] > > Sent: Friday, September 02, 2016 6:37 PM > > > > > > To: users@nifi.apache.org > > Subject: Re: Processor to enrich attribute from external service > > > > > > > > Manish, > > > > > > > > Some of the queries in those processors could bring back lots of data, > and > > putting them into an attribute could cause memory issues. Another > concern is > > when the result is binary data, such as ExecuteSQL returning an Avro > file. > > And since the return of these is a collection of records, these > processors > > are often followed by a Split processor to perform operations on > individual > > records. > > > > > > > > Having said that, if the return value is text and you'd like to transfe= r > it > > to an attribute, you can use ExtractText to put the content into an > > attribute. For small content (which is the appropriate use case), this > > should be pretty fast, and keeps the logic in a single processor instea= d > of > > duplicated (either logically or physically) across processors. > > > > > > > > By the way I'm very interested in an RDBMS lookup processor, but not su= re > > I'd have time in the short run to write it up. If someone takes a crack > at > > it, I recommend properties to pre-cache the table with a refresh > interval. > > This way if the lookup table doesn't change much and is not too big, it > > could be read into the processor's memory for super-fast lookups. > > Alternatively, a property could be a cache size, which would build a > subset > > of the table in memory as values are looked up. This is probably more > robust > > as it is bounded and if the size is set high enough for a small table, = it > > would be read in its entirety. Still would want the cache refresh > property > > though. > > > > > > > > Cheers, > > > > Matt > > > > > > On Sep 2, 2016, at 6:19 PM, Manish Gupta 8 wrote= : > > > > Thanks for the reply Joe. Just a thought =E2=80=93 do you think it woul= d be a > good > > idea for every Get processor (GetMongo, GetHBase etc.) to have 2 > additional > > properties like: > > > > 1. Result in Content or Result in Attribute > > > > 2. Result Attribute Name (only applicable when =E2=80=9CResult in > Attribute=E2=80=9D is > > selected). > > > > But then all such processors should be able to accept incoming flowfile > > (which they don=E2=80=99t as of now =E2=80=93 being a =E2=80=9CGet=E2= =80=9D). > > > > > > > > May be ExecuteSQL and FetchDistributeMapCache can be enhanced that way > i.e. > > have an option to specify the destination =E2=80=93 content or attribut= e? > > > > > > > > Regards, > > > > Manish > > > > > > > > From: Joe Witt [mailto:joe.witt@gmail.com] > > Sent: Friday, September 02, 2016 5:58 PM > > To: users@nifi.apache.org > > Subject: Re: Processor to enrich attribute from external service > > > > > > > > You would need to make a custom process for now. I think we should hav= e > a > > nice controller service to generalize jdbc lookups which supports > caching. > > And then a processor which leverages it. > > > > This comes up fairly often and is pretty straightforward from a design > POV. > > Anyone want to take a stab at this? > > > > > > > > On Sep 2, 2016 4:47 PM, "Manish Gupta 8" wrote: > > > > Hello Everyone, > > > > > > > > Is there a processor that we can use for updating/adding an attribute o= f > an > > incoming flow file from some external service (say MongoDB or Couchbase > or > > any RDBMS)? The processor will use the attribute of incoming flow file, > > query the external service, and simply modify/add an additional > attribute of > > flow-file (without touching the flow file content). > > > > > > > > If we have to achieve this kind of =E2=80=9Clookup=E2=80=9D operation (= but only to update > > attribute and not the content), what are the options in NiFi? > > > > Should we create a custom processor (may be by taking GetMongo processo= r > and > > modifying its code to update an attribute with query result)? > > > > > > > > Thanks, > > > > Manish > > > > > --001a113de87ac82db1053b90eae1 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

How i have handled this personally is to use wrap sql proces= sors with handlehttprequest processor essentially making the db operation a= s a REsT webservice.

Then you have the option of fetchhttp processor update appen= ding the results in attribute instead of content, which is an option alread= y available.

With mongo db, you need not do this additional operation of = wrapping as it has a REST interface so directly use that in http processor.=


On Sat, Sep 3, 2016, 4:28 A= M Matt Burgess <mattyb149@apache= .org> wrote:
Agreed.=C2=A0 A= dditionally, if we want to get fancy, we can work with
incoming flow files based on MIME type (JSON, XML, CSV) and have a
"Path" property to a field in the document. Then the processor co= uld
replace inline the value in the document with the lookup value. If XML
files are coming in, the Path is an XPath expression. Same for JSON
and JSONPath, and CSV could be a column index (0-based, e.g.).

I have something very similar (not the lookup, but the "Path" thi= ng
for multiple file types) coming soon as a Jira case / PR ;) If that
proves useful, I could move it into a util or base class or something.

Regards,
Matt

On Fri, Sep 2, 2016 at 6:47 PM, Manish Gupta 8 <mgupta50@sapient.com> wrote:
> I think the lookup processor should return data in a format that can b= e
> efficiently parsed/processed by NiFi expression language. For example = =E2=80=93
> JSON. This would avoid using additional =E2=80=9CExtract=E2=80=9D type= processor. All the
> downstream processor can simply work with =E2=80=9CjsonPath=E2=80=9D f= or additional lookup
> inside the attribute.
>
>
>
> Regards,
>
> Manish
>
>
>
> From: Matt Burgess [mailto:mattyb149@gmail.com]
> Sent: Friday, September 02, 2016 6:37 PM
>
>
> To: users@n= ifi.apache.org
> Subject: Re: Processor to enrich attribute from external service
>
>
>
> Manish,
>
>
>
> Some of the queries in those processors could bring back lots of data,= and
> putting them into an attribute could cause memory issues. Another conc= ern is
> when the result is binary data, such as ExecuteSQL returning an Avro f= ile.
> And since the return of these is a collection of records, these proces= sors
> are often followed by a Split processor to perform operations on indiv= idual
> records.
>
>
>
> Having said that, if the return value is text and you'd like to tr= ansfer it
> to an attribute, you can use ExtractText to put the content into an > attribute. For small content (which is the appropriate use case), this=
> should be pretty fast, and keeps the logic in a single processor inste= ad of
> duplicated (either logically or physically) across processors.
>
>
>
> By the way I'm very interested in an RDBMS lookup processor, but n= ot sure
> I'd have time in the short run to write it up. If someone takes a = crack at
> it, I recommend properties to pre-cache the table with a refresh inter= val.
> This way if the lookup table doesn't change much and is not too bi= g, it
> could be read into the processor's memory for super-fast lookups.<= br> > Alternatively, a property could be a cache size, which would build a s= ubset
> of the table in memory as values are looked up. This is probably more = robust
> as it is bounded and if the size is set high enough for a small table,= it
> would be read in its entirety. Still would want the cache refresh prop= erty
> though.
>
>
>
> Cheers,
>
> Matt
>
>
> On Sep 2, 2016, at 6:19 PM, Manish Gupta 8 <mgupta50@sapient.com> wrote:
>
> Thanks for the reply Joe. Just a thought =E2=80=93 do you think it wou= ld be a good
> idea for every Get processor (GetMongo, GetHBase etc.) to have 2 addit= ional
> properties like:
>
> 1.=C2=A0 =C2=A0 =C2=A0 Result in Content or Result in Attribute
>
> 2.=C2=A0 =C2=A0 =C2=A0 Result Attribute Name (only applicable when =E2= =80=9CResult in Attribute=E2=80=9D is
> selected).
>
> But then all such processors should be able to accept incoming flowfil= e
> (which they don=E2=80=99t as of now =E2=80=93 being a =E2=80=9CGet=E2= =80=9D).
>
>
>
> May be ExecuteSQL and FetchDistributeMapCache can be enhanced that way= i.e.
> have an option to specify the destination =E2=80=93 content or attribu= te?
>
>
>
> Regards,
>
> Manish
>
>
>
> From: Joe Witt [mailto:joe.witt@gmail.com]
> Sent: Friday, September 02, 2016 5:58 PM
> To: users@n= ifi.apache.org
> Subject: Re: Processor to enrich attribute from external service
>
>
>
> You would need to make a custom process for now.=C2=A0 I think we shou= ld have a
> nice controller service to generalize jdbc lookups which supports cach= ing.
> And then a processor which leverages it.
>
> This comes up fairly often and is pretty straightforward from a design= POV.
> Anyone want to take a stab at this?
>
>
>
> On Sep 2, 2016 4:47 PM, "Manish Gupta 8" <mgupta50@sapient.com> wro= te:
>
> Hello Everyone,
>
>
>
> Is there a processor that we can use for updating/adding an attribute = of an
> incoming flow file from some external service (say MongoDB or Couchbas= e or
> any RDBMS)? The processor will use the attribute of incoming flow file= ,
> query the external service, and simply modify/add an additional attrib= ute of
> flow-file (without touching the flow file content).
>
>
>
> If we have to achieve this kind of =E2=80=9Clookup=E2=80=9D operation = (but only to update
> attribute and not the content), what are the options in NiFi?
>
> Should we create a custom processor (may be by taking GetMongo process= or and
> modifying its code to update an attribute with query result)?
>
>
>
> Thanks,
>
> Manish
>
>
--001a113de87ac82db1053b90eae1--