Mailing-List: contact users-help@nifi.apache.org; run by ezmlm
Precedence: bulk
Reply-To: users@nifi.apache.org
MIME-Version: 1.0
In-Reply-To: <47056F0C-5DB7-4FF9-91AE-08BD308B9413@gmail.com>
References: <CA+PpCqX9iYOT3kU1o9zfGCznOEZZfYUS3AJ2s-80_aNFTagJWQ@mail.gmail.com>
 <CAEV8zdWQPVGirZ2LC_TTCdJ=Ue8TP1=r4_2xG7Qk2MxqOofvXg@mail.gmail.com>
 <CA+PpCqWJnLaTf4Gjg3JtnPQm97fH+xpDftJ9-UdUChy62O5vTg@mail.gmail.com>
 <CAEV8zdXz9jwz_idyX0TwSz0Q4NZeKcCPeq9Hwfsyv-55fo4vEA@mail.gmail.com>
 <CA+PpCqV5VFPMm+9edF2hNSBsCGcbtGqqjokc5zWRoPf8O-O1aw@mail.gmail.com> <47056F0C-5DB7-4FF9-91AE-08BD308B9413@gmail.com>
From: Austin Duncan <aduncan@pyaanalytics.com>
Date: Tue, 22 Aug 2017 13:48:50 -0400
Message-ID: <CA+PpCqU3yP=FTGvzPJzm=QwjijRrHxBZ18fr7Hw_NDT3dpSkBw@mail.gmail.com>
Subject: Re: Upsert
To: users@nifi.apache.org
Content-Type: multipart/alternative; boundary="001a113ce5a0436afc05575b3593"

--001a113ce5a0436afc05575b3593
Content-Type: text/plain; charset="UTF-8"

All of the records have the same schema though I think. It's all the same
kind of data and the same format. We were originally using a PutSql
processor and were having trouble with it. The current setup actually runs
pretty well im just trying to figure out how to do an insert and then if
the data already exists in the table via the rfidnumber do an update
instead.

On Tue, Aug 22, 2017 at 12:05 PM, Matt Burgess <mattyb149@gmail.com> wrote:

> Your use case will be more complex, because you need to form SQL from your
> JSON. You can try ConvertJSONToSQL and then ReplaceText to add the ON
> CONFLICT to the end.
>
> Also if you are using SplitJson then PutDatabaseRecord won't be as
> efficient as it could be; it is meant to work on a number of records in a
> single flow file with the same schema. In your case you might be better off
> with PutSQL as it won't require schemas or statement types. The Split ->
> convert to SQL -> PutSQL pattern was the original way to do things, but for
> multiple records with the same schema, PutDatabaseRecord was added.
>
> Regards,
> Matt
>
> On Aug 22, 2017, at 11:43 AM, Austin Duncan <aduncan@pyaanalytics.com>
> wrote:
>
> The example you listed sounds like a solution I am just having trouble
> fully understanding. I want to make my own sql so that i can do the "insert
> into on conflict"
>  I am just having trouble really understanding what it is that I have to
> do so that the query and the data will be understood by the processor. Do I
> need different schemas? One like I sent and one for the query?
>
> On Tue, Aug 22, 2017 at 11:38 AM, Matt Burgess <mattyb149@apache.org>
> wrote:
>
>> If your incoming data is already in fields (JSON, e.g.) and not a SQL
>> statement, then the statement.type should be "insert" rather than
>> "sql". The "sql" type is for passing in explicit SQL statements to
>> execute rather than taking a record and generating the appropriate SQL
>> statement from the fields and the statement.type.
>>
>> If you are trying to generate your own SQL and execute that, then
>> you'd want to try something like the solution I outlined before, where
>> you put the SQL in a field such as "statement", set statement.type to
>> "sql" and Field Containing SQL to "statement".
>>
>> Does that make sense? Or am I misunderstanding what you are trying to do?
>>
>> Regards,
>> Matt
>>
>> On Tue, Aug 22, 2017 at 11:32 AM, Austin Duncan
>> <aduncan@pyaanalytics.com> wrote:
>> > Matt,
>> >
>> > I am using JsonPathReader and using the 'Schema Text' Property with a
>> schema
>> > defined in there. I could never figure out how to use any of the other
>> > Access Strategies. It's an inventory system so we are extracting the
>> Json
>> > and splitting it into a json file for each row. Then using this schema:
>> {
>> >  "name": "insertSql",
>> >  "type": "record",
>> >  "fields": [
>> >   {
>> >    "name": "RfidNumber",
>> >    "type": "string"
>> >   },
>> >   {
>> >    "name": "CabinetName",
>> >    "type": "string"
>> >   },
>> >   {
>> >    "name": "ItemNumber",
>> >    "type": "string"
>> >   },
>> >   {
>> >    "name": "LotNumber",
>> >    "type": "string"
>> >   },
>> >   {
>> >    "name": "PurchaseOrderNumber",
>> >    "type": "string"
>> >   },
>> >   {
>> >    "name": "PurchaseOrderPrice",
>> >    "type": "float"
>> >   },
>> >   {
>> >    "name": "SupplierId",
>> >    "type": "string"
>> >   },
>> >   {
>> >    "name": "SupplierName",
>> >    "type": "string"
>> >   },
>> >   {
>> >     "name": "updatedate",
>> >     "type": "string"
>> >     }
>> >  ]
>> > }
>> > I inserted the data into the table. I am not 100% on the uses of
>> schemas so
>> > I am not quite sure what you mean by using a schema to define the query.
>> >
>> > On Tue, Aug 22, 2017 at 11:23 AM, Matt Burgess <mattyb149@apache.org>
>> wrote:
>> >>
>> >> Austin,
>> >>
>> >> What are you using for a record reader and schema for
>> >> PutDatabaseRecord?  In order to execute SQL using PutDatabaseRecord,
>> >> you have to specify a "Field containing SQL", and the incoming
>> >> record(s) must have a field with that name. The value of that field
>> >> (for each record) will be executed.
>> >>
>> >> What I've done in the past is to put the whole SQL statement in a JSON
>> >> doc: {"query": "INSERT INTO table (column1 ,column2, column3, column4)
>> >> VALUES() ON CONFLICT (rfidnumber) DO UPDATE"} then I set Field
>> >> Containing SQL to "query", and use a JsonPathReader specifying a
>> >> "query" field with a path of $.query, or just a JsonTreeReader, either
>> >> reader using a schema with a single string field called "query".
>> >>
>> >> IMO an improvement would be nice to add a SQLReader that could split
>> >> on newlines or semicolons or whatever, and each "record" would contain
>> >> a SQL statement with the specified field name. That would save the
>> >> trouble of having to temporarily convert your SQL statement(s) into a
>> >> format that an existing Reader can recognize.
>> >>
>> >> Regards,
>> >> Matt
>> >>
>> >>
>> >> On Tue, Aug 22, 2017 at 11:12 AM, Austin Duncan
>> >> <aduncan@pyaanalytics.com> wrote:
>> >> > In my flow I am pulling data from a Json, splitting the Json and then
>> >> > inserting that into a postgres table using the putdatabaserecord
>> >> > processor.
>> >> > I have been using the insert statement option and it has been working
>> >> > fine
>> >> > but now I am trying to figure out how to do a INSERT INTO table ON
>> >> > CONFLICT
>> >> > UPDATE statement. I have the statement.type attribute set to SQL and
>> am
>> >> > trying to do the query:
>> >> >
>> >> > INSERT INTO table (column1 ,column2, column3, column4)
>> >> >
>> >> > VALUES()
>> >> >
>> >> > ON CONFLICT (rfidnumber) DO UPDATE;
>> >> >
>> >> > I am getting the error 'Record schema does not contain filed
>> containing
>> >> > SQL'. So two th   Any help would be appreciated.
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Austin
>> >
>> >
>>
>
>

--001a113ce5a0436afc05575b3593
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">All of the records have the same schema though I think. It=
&#39;s all the same kind of data and the same format. We were originally us=
ing a PutSql processor and were having trouble with it. The current setup a=
ctually runs pretty well im just trying to figure out how to do an insert a=
nd then if the data already exists in the table via the rfidnumber do an up=
date instead.<br></div><div class=3D"gmail_extra"><br><div class=3D"gmail_q=
uote">On Tue, Aug 22, 2017 at 12:05 PM, Matt Burgess <span dir=3D"ltr">&lt;=
<a href=3D"mailto:mattyb149@gmail.com" target=3D"_blank">mattyb149@gmail.co=
m</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margi=
n:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"auto"=
><div>Your use case will be more complex, because you need to form SQL from=
 your JSON. You can try ConvertJSONToSQL and then ReplaceText to add the ON=
 CONFLICT to the end.</div><div id=3D"m_4421295257574639509AppleMailSignatu=
re"><br></div><div id=3D"m_4421295257574639509AppleMailSignature">Also if y=
ou are using SplitJson then PutDatabaseRecord won&#39;t be as efficient as =
it could be; it is meant to work on a number of records in a single flow fi=
le with the same schema. In your case you might be better off with PutSQL a=
s it won&#39;t require schemas or statement types. The Split -&gt; convert =
to SQL -&gt; PutSQL pattern was the original way to do things, but for mult=
iple records with the same schema, PutDatabaseRecord was added.<br><br></di=
v><div id=3D"m_4421295257574639509AppleMailSignature">Regards,</div><div id=
=3D"m_4421295257574639509AppleMailSignature">Matt</div><div><div class=3D"h=
5"><div><br>On Aug 22, 2017, at 11:43 AM, Austin Duncan &lt;<a href=3D"mail=
to:aduncan@pyaanalytics.com" target=3D"_blank">aduncan@pyaanalytics.com</a>=
&gt; wrote:<br><br></div><blockquote type=3D"cite"><div><div dir=3D"ltr"><d=
iv>The example you listed sounds like a solution I am just having trouble f=
ully understanding. I want to make my own sql so that i can do the &quot;in=
sert into on conflict&quot;<br></div>=C2=A0I am just having trouble really =
understanding what it is that I have to do so that the query and the data w=
ill be understood by the processor. Do I need different schemas? One like I=
 sent and one for the query? <br></div><div class=3D"gmail_extra"><br><div =
class=3D"gmail_quote">On Tue, Aug 22, 2017 at 11:38 AM, Matt Burgess <span =
dir=3D"ltr">&lt;<a href=3D"mailto:mattyb149@apache.org" target=3D"_blank">m=
attyb149@apache.org</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quo=
te" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"=
>If your incoming data is already in fields (JSON, e.g.) and not a SQL<br>
statement, then the statement.type should be &quot;insert&quot; rather than=
<br>
&quot;sql&quot;. The &quot;sql&quot; type is for passing in explicit SQL st=
atements to<br>
execute rather than taking a record and generating the appropriate SQL<br>
statement from the fields and the statement.type.<br>
<br>
If you are trying to generate your own SQL and execute that, then<br>
you&#39;d want to try something like the solution I outlined before, where<=
br>
you put the SQL in a field such as &quot;statement&quot;, set statement.typ=
e to<br>
&quot;sql&quot; and Field Containing SQL to &quot;statement&quot;.<br>
<br>
Does that make sense? Or am I misunderstanding what you are trying to do?<b=
r>
<br>
Regards,<br>
Matt<br>
<br>
On Tue, Aug 22, 2017 at 11:32 AM, Austin Duncan<br>
<div class=3D"m_4421295257574639509HOEnZb"><div class=3D"m_4421295257574639=
509h5">&lt;<a href=3D"mailto:aduncan@pyaanalytics.com" target=3D"_blank">ad=
uncan@pyaanalytics.com</a>&gt; wrote:<br>
&gt; Matt,<br>
&gt;<br>
&gt; I am using JsonPathReader and using the &#39;Schema Text&#39; Property=
 with a schema<br>
&gt; defined in there. I could never figure out how to use any of the other=
<br>
&gt; Access Strategies. It&#39;s an inventory system so we are extracting t=
he Json<br>
&gt; and splitting it into a json file for each row. Then using this schema=
: {<br>
&gt;=C2=A0 &quot;name&quot;: &quot;insertSql&quot;,<br>
&gt;=C2=A0 &quot;type&quot;: &quot;record&quot;,<br>
&gt;=C2=A0 &quot;fields&quot;: [<br>
&gt;=C2=A0 =C2=A0{<br>
&gt;=C2=A0 =C2=A0 &quot;name&quot;: &quot;RfidNumber&quot;,<br>
&gt;=C2=A0 =C2=A0 &quot;type&quot;: &quot;string&quot;<br>
&gt;=C2=A0 =C2=A0},<br>
&gt;=C2=A0 =C2=A0{<br>
&gt;=C2=A0 =C2=A0 &quot;name&quot;: &quot;CabinetName&quot;,<br>
&gt;=C2=A0 =C2=A0 &quot;type&quot;: &quot;string&quot;<br>
&gt;=C2=A0 =C2=A0},<br>
&gt;=C2=A0 =C2=A0{<br>
&gt;=C2=A0 =C2=A0 &quot;name&quot;: &quot;ItemNumber&quot;,<br>
&gt;=C2=A0 =C2=A0 &quot;type&quot;: &quot;string&quot;<br>
&gt;=C2=A0 =C2=A0},<br>
&gt;=C2=A0 =C2=A0{<br>
&gt;=C2=A0 =C2=A0 &quot;name&quot;: &quot;LotNumber&quot;,<br>
&gt;=C2=A0 =C2=A0 &quot;type&quot;: &quot;string&quot;<br>
&gt;=C2=A0 =C2=A0},<br>
&gt;=C2=A0 =C2=A0{<br>
&gt;=C2=A0 =C2=A0 &quot;name&quot;: &quot;PurchaseOrderNumber&quot;,<br>
&gt;=C2=A0 =C2=A0 &quot;type&quot;: &quot;string&quot;<br>
&gt;=C2=A0 =C2=A0},<br>
&gt;=C2=A0 =C2=A0{<br>
&gt;=C2=A0 =C2=A0 &quot;name&quot;: &quot;PurchaseOrderPrice&quot;,<br>
&gt;=C2=A0 =C2=A0 &quot;type&quot;: &quot;float&quot;<br>
&gt;=C2=A0 =C2=A0},<br>
&gt;=C2=A0 =C2=A0{<br>
&gt;=C2=A0 =C2=A0 &quot;name&quot;: &quot;SupplierId&quot;,<br>
&gt;=C2=A0 =C2=A0 &quot;type&quot;: &quot;string&quot;<br>
&gt;=C2=A0 =C2=A0},<br>
&gt;=C2=A0 =C2=A0{<br>
&gt;=C2=A0 =C2=A0 &quot;name&quot;: &quot;SupplierName&quot;,<br>
&gt;=C2=A0 =C2=A0 &quot;type&quot;: &quot;string&quot;<br>
&gt;=C2=A0 =C2=A0},<br>
&gt;=C2=A0 =C2=A0{<br>
&gt;=C2=A0 =C2=A0 =C2=A0&quot;name&quot;: &quot;updatedate&quot;,<br>
&gt;=C2=A0 =C2=A0 =C2=A0&quot;type&quot;: &quot;string&quot;<br>
&gt;=C2=A0 =C2=A0 =C2=A0}<br>
&gt;=C2=A0 ]<br>
&gt; }<br>
&gt; I inserted the data into the table. I am not 100% on the uses of schem=
as so<br>
&gt; I am not quite sure what you mean by using a schema to define the quer=
y.<br>
&gt;<br>
&gt; On Tue, Aug 22, 2017 at 11:23 AM, Matt Burgess &lt;<a href=3D"mailto:m=
attyb149@apache.org" target=3D"_blank">mattyb149@apache.org</a>&gt; wrote:<=
br>
&gt;&gt;<br>
&gt;&gt; Austin,<br>
&gt;&gt;<br>
&gt;&gt; What are you using for a record reader and schema for<br>
&gt;&gt; PutDatabaseRecord?=C2=A0 In order to execute SQL using PutDatabase=
Record,<br>
&gt;&gt; you have to specify a &quot;Field containing SQL&quot;, and the in=
coming<br>
&gt;&gt; record(s) must have a field with that name. The value of that fiel=
d<br>
&gt;&gt; (for each record) will be executed.<br>
&gt;&gt;<br>
&gt;&gt; What I&#39;ve done in the past is to put the whole SQL statement i=
n a JSON<br>
&gt;&gt; doc: {&quot;query&quot;: &quot;INSERT INTO table (column1 ,column2=
, column3, column4)<br>
&gt;&gt; VALUES() ON CONFLICT (rfidnumber) DO UPDATE&quot;} then I set Fiel=
d<br>
&gt;&gt; Containing SQL to &quot;query&quot;, and use a JsonPathReader spec=
ifying a<br>
&gt;&gt; &quot;query&quot; field with a path of $.query, or just a JsonTree=
Reader, either<br>
&gt;&gt; reader using a schema with a single string field called &quot;quer=
y&quot;.<br>
&gt;&gt;<br>
&gt;&gt; IMO an improvement would be nice to add a SQLReader that could spl=
it<br>
&gt;&gt; on newlines or semicolons or whatever, and each &quot;record&quot;=
 would contain<br>
&gt;&gt; a SQL statement with the specified field name. That would save the=
<br>
&gt;&gt; trouble of having to temporarily convert your SQL statement(s) int=
o a<br>
&gt;&gt; format that an existing Reader can recognize.<br>
&gt;&gt;<br>
&gt;&gt; Regards,<br>
&gt;&gt; Matt<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; On Tue, Aug 22, 2017 at 11:12 AM, Austin Duncan<br>
&gt;&gt; &lt;<a href=3D"mailto:aduncan@pyaanalytics.com" target=3D"_blank">=
aduncan@pyaanalytics.com</a>&gt; wrote:<br>
&gt;&gt; &gt; In my flow I am pulling data from a Json, splitting the Json =
and then<br>
&gt;&gt; &gt; inserting that into a postgres table using the putdatabaserec=
ord<br>
&gt;&gt; &gt; processor.<br>
&gt;&gt; &gt; I have been using the insert statement option and it has been=
 working<br>
&gt;&gt; &gt; fine<br>
&gt;&gt; &gt; but now I am trying to figure out how to do a INSERT INTO tab=
le ON<br>
&gt;&gt; &gt; CONFLICT<br>
&gt;&gt; &gt; UPDATE statement. I have the statement.type attribute set to =
SQL and am<br>
&gt;&gt; &gt; trying to do the query:<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; INSERT INTO table (column1 ,column2, column3, column4)<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; VALUES()<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; ON CONFLICT (rfidnumber) DO UPDATE;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; I am getting the error &#39;Record schema does not contain fi=
led containing<br>
&gt;&gt; &gt; SQL&#39;. So two th=C2=A0 =C2=A0Any help would be appreciated=
.<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; Thanks,<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; Austin<br>
&gt;<br>
&gt;<br>
</div></div></blockquote></div><br></div>
</div></blockquote></div></div></div></blockquote></div><br></div>

--001a113ce5a0436afc05575b3593--