Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3556616753A for ; Tue, 22 Aug 2017 19:48:56 +0200 (CEST) Received: (qmail 5229 invoked by uid 500); 22 Aug 2017 17:48:53 -0000 Mailing-List: contact users-help@nifi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@nifi.apache.org Delivered-To: mailing list users@nifi.apache.org Received: (qmail 5218 invoked by uid 99); 22 Aug 2017 17:48:53 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Aug 2017 17:48:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 520E3C2D2D for ; Tue, 22 Aug 2017 17:48:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.479 X-Spam-Level: ** X-Spam-Status: No, score=2.479 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=pyaanalytics-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id WOFexdxIV_KX for ; Tue, 22 Aug 2017 17:48:51 +0000 (UTC) Received: from mail-oi0-f42.google.com (mail-oi0-f42.google.com [209.85.218.42]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 8C83E5F6C2 for ; Tue, 22 Aug 2017 17:48:51 +0000 (UTC) Received: by mail-oi0-f42.google.com with SMTP id f11so193170584oic.0 for ; Tue, 22 Aug 2017 10:48:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pyaanalytics-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=asnbxshHzoYjmSptjXthH/gJM93I4km+kLtE7GvhsfY=; b=s5Ol4x4Q/aWgpfqw7qAqKF7JMdUCL6hWOIWjWX+S0Lex+qV+KAC8lKuCmZyh2ypTIh 5dZUe/QG4H1p5clkj6PMBdcLAddV/W4g7/By87edpEu3jM3t0XLPd5gY1SVlxkGospZu 7s3MJJ/pHyXxvukZH0pAFgGFtTlr9MID1UmFkPZJiq7hqF+V0MVbteSPBuFz3QTnJ/VK RKLg3tD0Riufvohl3WbgcPJpzqSrtE2x6FFt1j07mi6MvWLo0szT/cQkOC/HFW86rBEL O8XHjwOsUikBy+AHKe8FBiKGv3WJCwL1IH/BfLMcjrKlU1IEptx730mdqicd85AHtyqR e/eQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=asnbxshHzoYjmSptjXthH/gJM93I4km+kLtE7GvhsfY=; b=dLiUmIkg38o8ot4Tocrqu+F1uLKSaxJ1/vW29pw+QvL0PRDMaHp2PZyd5byGhzV0MX kib7BfgfgQufxCLyVj9IctmSJFmTAO5Jc+2D2gAjQqQMOFZM7xBBVfsUBXh2igdEV87W o6VmaiPjayzxRcxsu08utKkNk7VdaCTDl+kEfApDEyzGOLyk8+MgIKKZPXKXjL9LkzNr TqsqHLSNrUNYbCns0HSe69Rjz+0fY/8XfS2Ys/+hjBobu4HQOdzKsI+XvCdM+8PttQf7 Qb7uiXt01nSNlol+6RagnCQ5NGY2Wq//paHhiMNbA9tco3S5gKUXey+bb5GJqjOwEFSN mWng== X-Gm-Message-State: AHYfb5hP152zBWyL2B4pH2p4z9feSPiCRcPKnzThPWfMVQttWg+BUMbO rDEG8HK1cvSVuVqZJD0QCx1tsqUt+EHi X-Received: by 10.202.235.4 with SMTP id j4mr1916632oih.43.1503424130797; Tue, 22 Aug 2017 10:48:50 -0700 (PDT) MIME-Version: 1.0 Received: by 10.157.56.33 with HTTP; Tue, 22 Aug 2017 10:48:50 -0700 (PDT) In-Reply-To: <47056F0C-5DB7-4FF9-91AE-08BD308B9413@gmail.com> References: <47056F0C-5DB7-4FF9-91AE-08BD308B9413@gmail.com> From: Austin Duncan Date: Tue, 22 Aug 2017 13:48:50 -0400 Message-ID: Subject: Re: Upsert To: users@nifi.apache.org Content-Type: multipart/alternative; boundary="001a113ce5a0436afc05575b3593" --001a113ce5a0436afc05575b3593 Content-Type: text/plain; charset="UTF-8" All of the records have the same schema though I think. It's all the same kind of data and the same format. We were originally using a PutSql processor and were having trouble with it. The current setup actually runs pretty well im just trying to figure out how to do an insert and then if the data already exists in the table via the rfidnumber do an update instead. On Tue, Aug 22, 2017 at 12:05 PM, Matt Burgess wrote: > Your use case will be more complex, because you need to form SQL from your > JSON. You can try ConvertJSONToSQL and then ReplaceText to add the ON > CONFLICT to the end. > > Also if you are using SplitJson then PutDatabaseRecord won't be as > efficient as it could be; it is meant to work on a number of records in a > single flow file with the same schema. In your case you might be better off > with PutSQL as it won't require schemas or statement types. The Split -> > convert to SQL -> PutSQL pattern was the original way to do things, but for > multiple records with the same schema, PutDatabaseRecord was added. > > Regards, > Matt > > On Aug 22, 2017, at 11:43 AM, Austin Duncan > wrote: > > The example you listed sounds like a solution I am just having trouble > fully understanding. I want to make my own sql so that i can do the "insert > into on conflict" > I am just having trouble really understanding what it is that I have to > do so that the query and the data will be understood by the processor. Do I > need different schemas? One like I sent and one for the query? > > On Tue, Aug 22, 2017 at 11:38 AM, Matt Burgess > wrote: > >> If your incoming data is already in fields (JSON, e.g.) and not a SQL >> statement, then the statement.type should be "insert" rather than >> "sql". The "sql" type is for passing in explicit SQL statements to >> execute rather than taking a record and generating the appropriate SQL >> statement from the fields and the statement.type. >> >> If you are trying to generate your own SQL and execute that, then >> you'd want to try something like the solution I outlined before, where >> you put the SQL in a field such as "statement", set statement.type to >> "sql" and Field Containing SQL to "statement". >> >> Does that make sense? Or am I misunderstanding what you are trying to do? >> >> Regards, >> Matt >> >> On Tue, Aug 22, 2017 at 11:32 AM, Austin Duncan >> wrote: >> > Matt, >> > >> > I am using JsonPathReader and using the 'Schema Text' Property with a >> schema >> > defined in there. I could never figure out how to use any of the other >> > Access Strategies. It's an inventory system so we are extracting the >> Json >> > and splitting it into a json file for each row. Then using this schema: >> { >> > "name": "insertSql", >> > "type": "record", >> > "fields": [ >> > { >> > "name": "RfidNumber", >> > "type": "string" >> > }, >> > { >> > "name": "CabinetName", >> > "type": "string" >> > }, >> > { >> > "name": "ItemNumber", >> > "type": "string" >> > }, >> > { >> > "name": "LotNumber", >> > "type": "string" >> > }, >> > { >> > "name": "PurchaseOrderNumber", >> > "type": "string" >> > }, >> > { >> > "name": "PurchaseOrderPrice", >> > "type": "float" >> > }, >> > { >> > "name": "SupplierId", >> > "type": "string" >> > }, >> > { >> > "name": "SupplierName", >> > "type": "string" >> > }, >> > { >> > "name": "updatedate", >> > "type": "string" >> > } >> > ] >> > } >> > I inserted the data into the table. I am not 100% on the uses of >> schemas so >> > I am not quite sure what you mean by using a schema to define the query. >> > >> > On Tue, Aug 22, 2017 at 11:23 AM, Matt Burgess >> wrote: >> >> >> >> Austin, >> >> >> >> What are you using for a record reader and schema for >> >> PutDatabaseRecord? In order to execute SQL using PutDatabaseRecord, >> >> you have to specify a "Field containing SQL", and the incoming >> >> record(s) must have a field with that name. The value of that field >> >> (for each record) will be executed. >> >> >> >> What I've done in the past is to put the whole SQL statement in a JSON >> >> doc: {"query": "INSERT INTO table (column1 ,column2, column3, column4) >> >> VALUES() ON CONFLICT (rfidnumber) DO UPDATE"} then I set Field >> >> Containing SQL to "query", and use a JsonPathReader specifying a >> >> "query" field with a path of $.query, or just a JsonTreeReader, either >> >> reader using a schema with a single string field called "query". >> >> >> >> IMO an improvement would be nice to add a SQLReader that could split >> >> on newlines or semicolons or whatever, and each "record" would contain >> >> a SQL statement with the specified field name. That would save the >> >> trouble of having to temporarily convert your SQL statement(s) into a >> >> format that an existing Reader can recognize. >> >> >> >> Regards, >> >> Matt >> >> >> >> >> >> On Tue, Aug 22, 2017 at 11:12 AM, Austin Duncan >> >> wrote: >> >> > In my flow I am pulling data from a Json, splitting the Json and then >> >> > inserting that into a postgres table using the putdatabaserecord >> >> > processor. >> >> > I have been using the insert statement option and it has been working >> >> > fine >> >> > but now I am trying to figure out how to do a INSERT INTO table ON >> >> > CONFLICT >> >> > UPDATE statement. I have the statement.type attribute set to SQL and >> am >> >> > trying to do the query: >> >> > >> >> > INSERT INTO table (column1 ,column2, column3, column4) >> >> > >> >> > VALUES() >> >> > >> >> > ON CONFLICT (rfidnumber) DO UPDATE; >> >> > >> >> > I am getting the error 'Record schema does not contain filed >> containing >> >> > SQL'. So two th Any help would be appreciated. >> >> > >> >> > Thanks, >> >> > >> >> > Austin >> > >> > >> > > --001a113ce5a0436afc05575b3593 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
All of the records have the same schema though I think. It= 's all the same kind of data and the same format. We were originally us= ing a PutSql processor and were having trouble with it. The current setup a= ctually runs pretty well im just trying to figure out how to do an insert a= nd then if the data already exists in the table via the rfidnumber do an up= date instead.

On Tue, Aug 22, 2017 at 12:05 PM, Matt Burgess <= mattyb149@gmail.co= m> wrote:
Your use case will be more complex, because you need to form SQL from= your JSON. You can try ConvertJSONToSQL and then ReplaceText to add the ON= CONFLICT to the end.

Also if y= ou are using SplitJson then PutDatabaseRecord won't be as efficient as = it could be; it is meant to work on a number of records in a single flow fi= le with the same schema. In your case you might be better off with PutSQL a= s it won't require schemas or statement types. The Split -> convert = to SQL -> PutSQL pattern was the original way to do things, but for mult= iple records with the same schema, PutDatabaseRecord was added.

Regards,
Matt

On Aug 22, 2017, at 11:43 AM, Austin Duncan <aduncan@pyaanalytics.com= > wrote:

The example you listed sounds like a solution I am just having trouble f= ully understanding. I want to make my own sql so that i can do the "in= sert into on conflict"
=C2=A0I am just having trouble really = understanding what it is that I have to do so that the query and the data w= ill be understood by the processor. Do I need different schemas? One like I= sent and one for the query?

On Tue, Aug 22, 2017 at 11:38 AM, Matt Burgess <m= attyb149@apache.org> wrote:
If your incoming data is already in fields (JSON, e.g.) and not a SQL
statement, then the statement.type should be "insert" rather than=
"sql". The "sql" type is for passing in explicit SQL st= atements to
execute rather than taking a record and generating the appropriate SQL
statement from the fields and the statement.type.

If you are trying to generate your own SQL and execute that, then
you'd want to try something like the solution I outlined before, where<= br> you put the SQL in a field such as "statement", set statement.typ= e to
"sql" and Field Containing SQL to "statement".

Does that make sense? Or am I misunderstanding what you are trying to do?
Regards,
Matt

On Tue, Aug 22, 2017 at 11:32 AM, Austin Duncan
<ad= uncan@pyaanalytics.com> wrote:
> Matt,
>
> I am using JsonPathReader and using the 'Schema Text' Property= with a schema
> defined in there. I could never figure out how to use any of the other=
> Access Strategies. It's an inventory system so we are extracting t= he Json
> and splitting it into a json file for each row. Then using this schema= : {
>=C2=A0 "name": "insertSql",
>=C2=A0 "type": "record",
>=C2=A0 "fields": [
>=C2=A0 =C2=A0{
>=C2=A0 =C2=A0 "name": "RfidNumber",
>=C2=A0 =C2=A0 "type": "string"
>=C2=A0 =C2=A0},
>=C2=A0 =C2=A0{
>=C2=A0 =C2=A0 "name": "CabinetName",
>=C2=A0 =C2=A0 "type": "string"
>=C2=A0 =C2=A0},
>=C2=A0 =C2=A0{
>=C2=A0 =C2=A0 "name": "ItemNumber",
>=C2=A0 =C2=A0 "type": "string"
>=C2=A0 =C2=A0},
>=C2=A0 =C2=A0{
>=C2=A0 =C2=A0 "name": "LotNumber",
>=C2=A0 =C2=A0 "type": "string"
>=C2=A0 =C2=A0},
>=C2=A0 =C2=A0{
>=C2=A0 =C2=A0 "name": "PurchaseOrderNumber",
>=C2=A0 =C2=A0 "type": "string"
>=C2=A0 =C2=A0},
>=C2=A0 =C2=A0{
>=C2=A0 =C2=A0 "name": "PurchaseOrderPrice",
>=C2=A0 =C2=A0 "type": "float"
>=C2=A0 =C2=A0},
>=C2=A0 =C2=A0{
>=C2=A0 =C2=A0 "name": "SupplierId",
>=C2=A0 =C2=A0 "type": "string"
>=C2=A0 =C2=A0},
>=C2=A0 =C2=A0{
>=C2=A0 =C2=A0 "name": "SupplierName",
>=C2=A0 =C2=A0 "type": "string"
>=C2=A0 =C2=A0},
>=C2=A0 =C2=A0{
>=C2=A0 =C2=A0 =C2=A0"name": "updatedate",
>=C2=A0 =C2=A0 =C2=A0"type": "string"
>=C2=A0 =C2=A0 =C2=A0}
>=C2=A0 ]
> }
> I inserted the data into the table. I am not 100% on the uses of schem= as so
> I am not quite sure what you mean by using a schema to define the quer= y.
>
> On Tue, Aug 22, 2017 at 11:23 AM, Matt Burgess <mattyb149@apache.org> wrote:<= br> >>
>> Austin,
>>
>> What are you using for a record reader and schema for
>> PutDatabaseRecord?=C2=A0 In order to execute SQL using PutDatabase= Record,
>> you have to specify a "Field containing SQL", and the in= coming
>> record(s) must have a field with that name. The value of that fiel= d
>> (for each record) will be executed.
>>
>> What I've done in the past is to put the whole SQL statement i= n a JSON
>> doc: {"query": "INSERT INTO table (column1 ,column2= , column3, column4)
>> VALUES() ON CONFLICT (rfidnumber) DO UPDATE"} then I set Fiel= d
>> Containing SQL to "query", and use a JsonPathReader spec= ifying a
>> "query" field with a path of $.query, or just a JsonTree= Reader, either
>> reader using a schema with a single string field called "quer= y".
>>
>> IMO an improvement would be nice to add a SQLReader that could spl= it
>> on newlines or semicolons or whatever, and each "record"= would contain
>> a SQL statement with the specified field name. That would save the=
>> trouble of having to temporarily convert your SQL statement(s) int= o a
>> format that an existing Reader can recognize.
>>
>> Regards,
>> Matt
>>
>>
>> On Tue, Aug 22, 2017 at 11:12 AM, Austin Duncan
>> <= aduncan@pyaanalytics.com> wrote:
>> > In my flow I am pulling data from a Json, splitting the Json = and then
>> > inserting that into a postgres table using the putdatabaserec= ord
>> > processor.
>> > I have been using the insert statement option and it has been= working
>> > fine
>> > but now I am trying to figure out how to do a INSERT INTO tab= le ON
>> > CONFLICT
>> > UPDATE statement. I have the statement.type attribute set to = SQL and am
>> > trying to do the query:
>> >
>> > INSERT INTO table (column1 ,column2, column3, column4)
>> >
>> > VALUES()
>> >
>> > ON CONFLICT (rfidnumber) DO UPDATE;
>> >
>> > I am getting the error 'Record schema does not contain fi= led containing
>> > SQL'. So two th=C2=A0 =C2=A0Any help would be appreciated= .
>> >
>> > Thanks,
>> >
>> > Austin
>
>


--001a113ce5a0436afc05575b3593--