Return-Path: X-Original-To: apmail-nifi-users-archive@minotaur.apache.org Delivered-To: apmail-nifi-users-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 039E019E6B for ; Tue, 26 Apr 2016 17:05:28 +0000 (UTC) Received: (qmail 2168 invoked by uid 500); 26 Apr 2016 17:05:27 -0000 Delivered-To: apmail-nifi-users-archive@nifi.apache.org Received: (qmail 2143 invoked by uid 500); 26 Apr 2016 17:05:27 -0000 Mailing-List: contact users-help@nifi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@nifi.apache.org Delivered-To: mailing list users@nifi.apache.org Received: (qmail 2133 invoked by uid 99); 26 Apr 2016 17:05:27 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Apr 2016 17:05:27 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 35C591A0454 for ; Tue, 26 Apr 2016 17:05:27 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.63 X-Spam-Level: ** X-Spam-Status: No, score=2.63 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, KAM_INFOUSMEBIZ=0.75, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, WEIRD_QUOTING=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id eTVdb84J065z for ; Tue, 26 Apr 2016 17:05:23 +0000 (UTC) Received: from mail-yw0-f179.google.com (mail-yw0-f179.google.com [209.85.161.179]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id B5B825F250 for ; Tue, 26 Apr 2016 17:05:22 +0000 (UTC) Received: by mail-yw0-f179.google.com with SMTP id j74so21101781ywg.1 for ; Tue, 26 Apr 2016 10:05:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to; bh=wLln/Cgkr3acehZhfTHiLgqiQ6KlU/sMepe3HzOJoD4=; b=D2pqgFGvY/Keucv/q4OMplquwTrrLhmsa4/SfJZgfZFAaiGi8KSMTD+p+QgaZegl8z LQm9T1bEWe+H/2g+nOAD/lu7AHjoRtIkyIUtqWMfLcOdgzNL0NuVVOm2ZoGFl+4ZYG8n gfR9Jtxmpj1a3ZpUEZeKyREGHsfX83Tx4OHqU+h2rgIJ6/aEOBNYCF2BHXwlBco5phmg 04ccf/YLRHJ/5RbZpbAwWL3ZF+12J9O8sp8tdOKaZw3Vr+1LMhfTLXst6a6Zt6c1IU7D iFlZFRScnVM/0gIdAfBQ6N0rl8bpwRjqQNHhzInkPam3ix7qopjzZRkRVZlUWWPuwRYa UiGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=wLln/Cgkr3acehZhfTHiLgqiQ6KlU/sMepe3HzOJoD4=; b=EtuV5rdxPXEyJmWmsZiiBFKJxTFFLkI4ALEgMWKDMcsZyBBJ9Ywwsz+Gp9TYSyKsla FNZiVxuRejVZCi6B+Fn4kOPrwHDpuQugvQtG8KDFenhfcPebYnLm26FGB5VvEuHVkiQQ M4Vnjuaoqx9fJlZgBY5D5c/O5Q673InQq18bDMlG5Y+CpD8FBzRq+sr2I79D+/E2JgBU nH2wfG76CbX3UAf/v6VXiafmQwYuuauyDIau/oqz56BPPA1sxBxkWl1XQcFqUrZTDP6X wQbkm3mFMNog/Q/qJ834T9hK7MYYdGNfoU+JieFYXaKpCQvsUmljHaZI8xWJRm1Emq8T TXgw== X-Gm-Message-State: AOPr4FV9bZLPVsLlb82oEKtEpqkB1hDUfK6x3zyZ/tNz0MKvC5CZ+HT+PIEg2Df9/uSetI9uKfTzTUZluAqAcw== MIME-Version: 1.0 X-Received: by 10.13.212.7 with SMTP id w7mr2206419ywd.129.1461690322110; Tue, 26 Apr 2016 10:05:22 -0700 (PDT) Received: by 10.129.108.81 with HTTP; Tue, 26 Apr 2016 10:05:22 -0700 (PDT) In-Reply-To: References: <9B26CE37-4817-41BC-8217-5FD7F058D2BA@hpe.com> <5B412507-1C4F-4EC7-AC84-FEEDD7B19D29@hpe.com> Date: Tue, 26 Apr 2016 13:05:22 -0400 Message-ID: Subject: Re: ReplaceText processor configuration help From: Igor Kravzov To: users@nifi.apache.org Content-Type: multipart/alternative; boundary=001a114fa7486bd0540531664ce8 --001a114fa7486bd0540531664ce8 Content-Type: text/plain; charset=UTF-8 Hi Matt, You described an interesting process. I will think about it. Initially I wanted to grab just some properties, like "entities" and "text", of original JSON and create anew one. ReplaceTexts works fine as long as "text" value does not have quotes inside the text. Once it has quotes and goes through the replace processor, JSON became broken. Also looking for some alterbatives like using Groovy for JSON-to-JSON conversion. But not sure how StandardCharsets.UTF_8 will work with multi-byte languages. On Tue, Apr 26, 2016 at 12:11 PM, Matt Burgess wrote: > Yes, I think you'll be better off with Aldrin's suggestion of > ReplaceText. Then you can put the value of the attribute(s) directly > into the content. For example, if you have two attributes "entities" > and "users", and you want a JSON doc with those two objects inside, > you can use ReplaceText with the following for replacement: > > {"entities": ${entities}, "users": ${users}} > > Note this "manually" transforms the JSON. Before we get the > TransformJSON processor, this is a decent workaround if you know what > the resulting JSON document should look like (and if you have > attributes containing the desired values). > > If you're doing this to insert into Elasticsearch, you might want to > handle entities and users separately and have "types" in ES for > "entities" and "users". In that case you could use EvaluateJsonPath to > get both attributes out, then wire the "success" relationship to two > different ReplaceTexts, one to store the entities and one for users. > Then you could add an attribute called "es.type" (for example), set to > "entities" and "users" respectively. Then you can send both forks to a > PutElasticsearch, setting the Type property to "${es.type}". That will > put the entities documents into the entities type and the same for > users. This will help with indexing versus one huge document. > > This process can be broken down into individual entities and users, if > you'd like a separate ES document for each. In that case you'd likely > need a SplitJson after the ReplaceText, pointing at the array of > entity/user objects. Then you'll get a flow file per entity/user, > meaning you'll get a separate ES doc for each entity and user, > stored/indexed/categorized by its type. > > Does this help solve your use case? If not please let me know, I'm > happy to help work through this :) > > Regards, > Matt > > On Tue, Apr 26, 2016 at 11:51 AM, Igor Kravzov > wrote: > > I see. > > But I think I found the problem. It's AttributesToJson escapes the > result. > > > > On Apr 26, 2016 11:46 AM, "McDermott, Chris Kevin (MSDU - > > STaTS/StorefrontRemote)" wrote: > >> > >> Hi Igor, > >> > >> jsonPath will return JSON as an unescaped String. > >> > >> Chris > >> > >> From: Igor Kravzov igork.inexso@gmail.com>> > >> Reply-To: "users@nifi.apache.org" > >> > > >> Date: Monday, April 25, 2016 at 2:27 PM > >> To: "users@nifi.apache.org" > >> > > >> Subject: Re: ReplaceText processor configuration help > >> > >> Hi Chris, > >> > >> How will it help in my situation? > >> > >> On Mon, Apr 25, 2016 at 1:50 PM, McDermott, Chris Kevin (MSDU - > >> STaTS/StorefrontRemote) > >> > wrote: > >> Igor, > >> > >> I think the jsonPath extension to the EL is going to be the ticket > [1]. A > >> patch is available if you are willing to build NiFi yourself to test it > out. > >> > >> Cheers, > >> Chris > >> > >> [1] https://issues.apache.org/jira/browse/NIFI-1660 > >> > >> > >> From: Igor Kravzov > >> igork.inexso@gmail.com>> > >> Reply-To: > >> "users@nifi.apache.org users@nifi.apache.org>" > >> users@nifi.apache.org>> > >> Date: Monday, April 25, 2016 at 11:45 AM > >> To: > >> "users@nifi.apache.org users@nifi.apache.org>" > >> users@nifi.apache.org>> > >> Subject: Re: ReplaceText processor configuration help > >> > >> Aldrin, > >> > >> The overall goal is to extract some subset of attributes from tweet's > >> JSON, create a new JSON and ingest it into Elasticsearch for indexing. > >> Hope this helps. > >> > >> On Mon, Apr 25, 2016 at 11:18 AM, Aldrin Piri > >> aldrinpiri@gmail.com>> > >> wrote: > >> Igor, > >> > >> Thanks for the template. It looks like the trouble is with > >> AttributesToJSON converting the attribute, which in your case, is a JSON > >> blob, into additional JSON and thus the escaping to ensure nothing is > lost. > >> Are you just trying to get that entity body out to a file? If so, the > >> AttributesToJSON is likely not needed and you should be able to use > >> something like ReplaceText to write the attribute to the FlowFile body. > >> Please let us know your overall goal and we can see if the right mix of > >> components already exists or if we are running into a path that may need > >> some additional functionality. > >> > >> Thanks! > >> Aldrin > >> > >> > >> > >> On Mon, Apr 25, 2016 at 10:33 AM, Igor Kravzov > >> igork.inexso@gmail.com>> > >> wrote: > >> Hi Aldrin, > >> > >> > >> Attached please find the template. In this workflow I want to pull > >> "entities" and "user" entries for Twitter JSON as entire structure. I > only > >> can do it if I set Return Type as JSON. > >> Subsequently I use AttributesToJSON to create a new JSON file. But > >> returning values for "entities" and "user" are escaped so I had to clean > >> these before converting to JSON. > >> > >> Hope this helps. > >> > >> On Mon, Apr 25, 2016 at 10:15 AM, Aldrin Piri > >> aldrinpiri@gmail.com>> > >> wrote: > >> Hi Igor, > >> > >> That should certainly be possible. Would you mind opening up a ticket > >> (https://issues.apache.org/jira/browse/NIFI) and providing a template > of > >> your flow that is causing the issue? > >> > >> Thanks! > >> > >> On Mon, Apr 25, 2016 at 10:09 AM, Igor Kravzov > >> igork.inexso@gmail.com>> > >> wrote: > >> Thanks Pierre. It worked. Looks like I was doing something wrong inside > my > >> workflow. > >> Would not be it feasible to have an option for EvaluateJsonPath > processor > >> to have an option to return escaped or unescaped JSON result? > >> > >> On Mon, Apr 25, 2016 at 7:20 AM, Pierre Villard > >> >>> > >> wrote: > >> Hi Igor, > >> > >> Please use ReplaceText processors. > >> > >> 1. > >> Search value : \\ > >> Replace value : Empty string set > >> > >> 2. > >> Search value : "\{ > >> Replace value : \{ > >> > >> 3. > >> Search value : \}" > >> Replace value : \} > >> > >> Template example attached. > >> > >> HTH > >> Pierre > >> > >> > >> 2016-04-24 20:12 GMT+02:00 Igor Kravzov > >> igork.inexso@gmail.com>>: > >> > >> I am not that good in regex. What would be the proper configuration to > do > >> the following; > >> > >> 1. Remove backslash from text. > >> 2. Replace "{ with { > >> 3. replace }" with } > >> > >> Basically I need to clean escaped JSON. > >> > >> Like before: > >> > >> > >> > "{\"hashtags\":[{\"text\":\"Apple\",\"indices\":[45,51]}],\"urls\":[{\"url\":\"\",\"expanded_url\":\"\",\"display_url\":\" > owler.us/abdLas\ >",\"indices\":[64,87]}],\"user_mentions\":[],\"symbols\":[{\"text\":\"AAPL\",\"indices\":[88,93]}]}", > >> > >> after: > >> > >> > >> > {"hashtags":[{"text":"Apple","indices":[45,51]}],"urls":[{"url":"","expanded_url":"","display_url":" > owler.us/abdLas >","indices":[64,87]}],"user_mentions":[],"symbols":[{"text":"AAPL","indices":[88,93]}]}, > >> > >> Thanks in advance. > >> > >> > >> > >> > >> > >> > >> > > > --001a114fa7486bd0540531664ce8 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Matt,

You described an interesting p= rocess. I will think about it.
Initially I wanted to grab just so= me properties, like "entities" and "text", =C2=A0of ori= ginal JSON and create anew one.
ReplaceTexts works fine as long a= s "text" value does not have quotes inside the text. Once it has = quotes and goes through the replace processor, JSON became broken.

Also looking for some alterbatives like using Groovy for J= SON-to-JSON conversion.=C2=A0 But not sure how=C2=A0StandardCharsets.UTF_8 = will work with multi-byte languages.


On Tue, Apr 26, 2016 at 12:= 11 PM, Matt Burgess <mattyb149@gmail.com> wrote:
Yes, I think you'll be better off with Aldrin&= #39;s suggestion of
ReplaceText. Then you can put the value of the attribute(s) directly
into the content.=C2=A0 For example, if you have two attributes "entit= ies"
and "users", and you want a JSON doc with those two objects insid= e,
you can use ReplaceText with the following for replacement:

{"entities": ${entities}, "users": ${users}}

Note this "manually" transforms the JSON. Before we get the
TransformJSON processor, this is a decent workaround if you know what
the resulting JSON document should look like (and if you have
attributes containing the desired values).

If you're doing this to insert into Elasticsearch, you might want to handle entities and users separately and have "types" in ES for "entities" and "users". In that case you could use Eval= uateJsonPath to
get both attributes out, then wire the "success" relationship to = two
different ReplaceTexts, one to store the entities and one for users.
Then you could add an attribute called "es.type" (for example), s= et to
"entities" and "users" respectively. Then you can send = both forks to a
PutElasticsearch, setting the Type property to "${es.type}". That= will
put the entities documents into the entities type and the same for
users. This will help with indexing versus one huge document.

This process can be broken down into individual entities and users, if
you'd like a separate ES document for each. In that case you'd like= ly
need a SplitJson after the ReplaceText, pointing at the array of
entity/user objects. Then you'll get a flow file per entity/user,
meaning you'll get a separate ES doc for each entity and user,
stored/indexed/categorized by its type.

Does this help solve your use case? If not please let me know, I'm
happy to help work through this :)

Regards,
Matt

On Tue, Apr 26, 2016 at 11:51 AM, Igor Kravzov <igork.inexso@gmail.com> wrote:
> I see.
> But I think I found the problem. It's AttributesToJson escapes the= result.
>
> On Apr 26, 2016 11:46 AM, "McDermott, Chris Kevin (MSDU -
> STaTS/StorefrontRemote)" <chris.mcdermott@hpe.com> wrote:
>>
>> Hi Igor,
>>
>> jsonPath will return JSON as an unescaped String.
>>
>> Chris
>>
>> From: Igor Kravzov <i= gork.inexso@gmail.com<mailto:igork.inexso@gmail.com>>
>> Reply-To: "users@nif= i.apache.org<mailto:users@n= ifi.apache.org>"
>> <users@nifi.apache.org= <mailto:users@nifi.apache.o= rg>>
>> Date: Monday, April 25, 2016 at 2:27 PM
>> To: "users@nifi.apac= he.org<mailto:users@nifi.ap= ache.org>"
>> <users@nifi.apache.org= <mailto:users@nifi.apache.o= rg>>
>> Subject: Re: ReplaceText processor configuration help
>>
>> Hi Chris,
>>
>> How will it help in my situation?
>>
>> On Mon, Apr 25, 2016 at 1:50 PM, McDermott, Chris Kevin (MSDU - >> STaTS/StorefrontRemote)
>> <chris.mcdermott@hpe= .com<mailto:chris.mcdermo= tt@hpe.com>> wrote:
>> Igor,
>>
>> I think the jsonPath extension to the EL is going to be the ticket= [1].=C2=A0 A
>> patch is available if you are willing to build NiFi yourself to te= st it out.
>>
>> Cheers,
>> Chris
>>
>> [1] https://issues.apache.org/jira/browse/NI= FI-1660
>>
>>
>> From: Igor Kravzov
>> <igork.inexso@gmail.c= om<mailto:igork.inexso@gma= il.com><mailto:igork.in= exso@gmail.com<mailto:igor= k.inexso@gmail.com>>>
>> Reply-To:
>> "users@nifi.apache.o= rg<mailto:users@nifi.apache= .org><mailto:users@nifi.= apache.org<mailto:users@nif= i.apache.org>>"
>> <users@nifi.apache.org= <mailto:users@nifi.apache.o= rg><mailto:users@nifi.ap= ache.org<mailto:users@nifi.= apache.org>>>
>> Date: Monday, April 25, 2016 at 11:45 AM
>> To:
>> "users@nifi.apache.o= rg<mailto:users@nifi.apache= .org><mailto:users@nifi.= apache.org<mailto:users@nif= i.apache.org>>"
>> <users@nifi.apache.org= <mailto:users@nifi.apache.o= rg><mailto:users@nifi.ap= ache.org<mailto:users@nifi.= apache.org>>>
>> Subject: Re: ReplaceText processor configuration help
>>
>> Aldrin,
>>
>> The overall goal is to extract some subset of attributes from twee= t's
>> JSON, create a new JSON and ingest it into Elasticsearch for index= ing.
>> Hope this helps.
>>
>> On Mon, Apr 25, 2016 at 11:18 AM, Aldrin Piri
>> <aldrinpiri@gmail.com<mailto:aldrinpiri@gmail.com><mailto:aldrinpiri@gmail.c= om<mailto:aldrinpiri@gmail.c= om>>>
>> wrote:
>> Igor,
>>
>> Thanks for the template.=C2=A0 It looks like the trouble is with >> AttributesToJSON converting the attribute, which in your case, is = a JSON
>> blob, into additional JSON and thus the escaping to ensure nothing= is lost.
>> Are you just trying to get that entity body out to a file?=C2=A0 I= f so, the
>> AttributesToJSON is likely not needed and you should be able to us= e
>> something like ReplaceText to write the attribute to the FlowFile = body.
>> Please let us know your overall goal and we can see if the right m= ix of
>> components already exists or if we are running into a path that ma= y need
>> some additional functionality.
>>
>> Thanks!
>> Aldrin
>>
>>
>>
>> On Mon, Apr 25, 2016 at 10:33 AM, Igor Kravzov
>> <igork.inexso@gmail.c= om<mailto:igork.inexso@gma= il.com><mailto:igork.in= exso@gmail.com<mailto:igor= k.inexso@gmail.com>>>
>> wrote:
>> Hi Aldrin,
>>
>>
>> Attached please find the template.=C2=A0 In this workflow I want t= o pull
>> "entities" and "user" entries for Twitter JSON= as entire structure. I only
>> can do it if I set Return Type as JSON.
>> Subsequently I use AttributesToJSON to create a new JSON file. But=
>> returning values for "entities" and "user" are= escaped so I had to clean
>> these before converting to JSON.
>>
>> Hope this helps.
>>
>> On Mon, Apr 25, 2016 at 10:15 AM, Aldrin Piri
>> <aldrinpiri@gmail.com<mailto:aldrinpiri@gmail.com><mailto:aldrinpiri@gmail.c= om<mailto:aldrinpiri@gmail.c= om>>>
>> wrote:
>> Hi Igor,
>>
>> That should certainly be possible.=C2=A0 Would you mind opening up= a ticket
>> (https://issues.apache.org/jira/browse/NIFI) = and providing a template of
>> your flow that is causing the issue?
>>
>> Thanks!
>>
>> On Mon, Apr 25, 2016 at 10:09 AM, Igor Kravzov
>> <igork.inexso@gmail.c= om<mailto:igork.inexso@gma= il.com><mailto:igork.in= exso@gmail.com<mailto:igor= k.inexso@gmail.com>>>
>> wrote:
>> Thanks Pierre. It worked. Looks like I was doing something wrong i= nside my
>> workflow.
>> Would not be it feasible to have an option for EvaluateJsonPath pr= ocessor
>> to have an option to return escaped or unescaped JSON result?
>>
>> On Mon, Apr 25, 2016 at 7:20 AM, Pierre Villard
>> <pierre.villard.= fr@gmail.com<mailto:p= ierre.villard.fr@gmail.com><mailto:pierre.villard.fr@gmail.com<mailto:pierre.villard.fr@gmail.com>>><= br> >> wrote:
>> Hi Igor,
>>
>> Please use ReplaceText processors.
>>
>> 1.
>> Search value : \\
>> Replace value : Empty string set
>>
>> 2.
>> Search value : "\{
>> Replace value : \{
>>
>> 3.
>> Search value : \}"
>> Replace value : \}
>>
>> Template example attached.
>>
>> HTH
>> Pierre
>>
>>
>> 2016-04-24 20:12 GMT+02:00 Igor Kravzov
>> <igork.inexso@gmail.c= om<mailto:igork.inexso@gma= il.com><mailto:igork.in= exso@gmail.com<mailto:igor= k.inexso@gmail.com>>>:
>>
>> I am not that good in regex. What would be the proper configuratio= n to do
>> the following;
>>
>>=C2=A0 =C2=A01.=C2=A0 Remove backslash from text.
>>=C2=A0 =C2=A02.=C2=A0 Replace "{ with {
>>=C2=A0 =C2=A03.=C2=A0 replace }" with }
>>
>> Basically I need to clean escaped JSON.
>>
>> Like before:
>>
>>
>> "{\"hashtags\":[{\"text\":\"Apple\&q= uot;,\"indices\":[45,51]}],\"urls\":[{\"url\"= :\"\",\"expanded_url\":\"\",\"display_ur= l\":\"owler.us/abdLas\<http://owler.us/abdLas\><http= ://owler.us/abdLas%5C>",\"indices\":[64,87]}],\"= user_mentions\":[],\"symbols\":[{\"text\":\"A= APL\",\"indices\":[88,93]}]}",
>>
>> after:
>>
>>
>> {"hashtags":[{"text":"Apple","i= ndices":[45,51]}],"urls":[{"url":"",&quo= t;expanded_url":"","display_url":"owler.us/abdLa= s<http://owler.us/abdLas><http://owler.us/abdLas>",&= quot;indices":[64,87]}],"user_mentions":[],"symbols&quo= t;:[{"text":"AAPL","indices":[88,93]}]},
>>
>> Thanks in advance.
>>
>>
>>
>>
>>
>>
>>
>

--001a114fa7486bd0540531664ce8--