Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 8C850200C79 for ; Fri, 19 May 2017 21:04:10 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 8AF80160BD1; Fri, 19 May 2017 19:04:10 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A7245160BB0 for ; Fri, 19 May 2017 21:04:09 +0200 (CEST) Received: (qmail 11769 invoked by uid 500); 19 May 2017 19:04:08 -0000 Mailing-List: contact dev-help@nifi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@nifi.apache.org Delivered-To: mailing list dev@nifi.apache.org Received: (qmail 11756 invoked by uid 99); 19 May 2017 19:04:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 May 2017 19:04:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 06238180312 for ; Fri, 19 May 2017 19:04:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.379 X-Spam-Level: X-Spam-Status: No, score=0.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 1D_KO_SHrquL for ; Fri, 19 May 2017 19:04:06 +0000 (UTC) Received: from mail-wm0-f50.google.com (mail-wm0-f50.google.com [74.125.82.50]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id B2E575FC51 for ; Fri, 19 May 2017 19:04:05 +0000 (UTC) Received: by mail-wm0-f50.google.com with SMTP id 7so9625180wmo.1 for ; Fri, 19 May 2017 12:04:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=cukORSp3fzZcxKwabGP7HnKXcdL1epzI8orT8f8XC4E=; b=fIfFiSRYmF6s9c3zxiumY5ICdACJQq7qHqfY0RetQZP712NpfjpvuN73w2qBfltx5t Kz3OOUbwZk9WZE008oCgIbznIXirfPAo8AGo5yBAgTRhEX//9SdKs5PIWmbwDoCDYrSs 16nFTobG/bSeZZ42rhupbIeRWKKt+uiBHU8ICEdafLFrYgpt6EDpJMOiYdonxCOwZPxK nHloJpaoqx44LhSvV3S79TaQyTmwKAmAg7qr6dplj/3/KFf4xnrZqw7BLbHNysPvyAHA goe/dWJQmmGFNYMWxqcV1Hsk+E4IkFnojMQPw2B9Vc2Bu5evbF3miIJwa2SQbWxWKYOn jDlA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=cukORSp3fzZcxKwabGP7HnKXcdL1epzI8orT8f8XC4E=; b=dV2AQl8AkbM64CfcUORs2AnjVhuykX1d/8NG8EQ4RjYbBWO7R3oqPWLquC96K2q0NA Q+G3/FIrYvAohx6ULw/mNx+eGpl+R4GWL0FKFvLQjxnyZsWl1F0BUXUdjihQi37xg6Lc C1T8qN5VVtXun4RRGd1/Vd2tSLUNbPJi58aEU4j5iAi205c16HHRUP/ID5siMW1kruNF mK+ozzpOZjvLCPgiyevA5G4iBOb8viVVx2bbm9Rle8qXAuS4YRll6NvtC5KgvHmOKAdS Fu1KqSy4SmbrnEEzLVLnyGbQuUWer0I9Aqhkn9glPAB2tGVf+rXz+rQaMxhpsBwZG8U3 vrSA== X-Gm-Message-State: AODbwcCSxz9fNrt9PFkq+gDZMxwDoXHVXTbDNQpaxFJ2jYZouHycC84B budxZS5Vx1TFtXZ7tPTrDkpTLhWHak2D X-Received: by 10.28.66.157 with SMTP id k29mr8132909wmi.84.1495220645036; Fri, 19 May 2017 12:04:05 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.139.91 with HTTP; Fri, 19 May 2017 12:04:04 -0700 (PDT) In-Reply-To: References: From: Bryan Bende Date: Fri, 19 May 2017 15:04:04 -0400 Message-ID: Subject: Re: NiFi 1.2.0 Record processors question To: dev@nifi.apache.org Content-Type: text/plain; charset="UTF-8" archived-at: Fri, 19 May 2017 19:04:10 -0000 When a reader produces a record it attaches the schema it used to the record, but we currently don't have a way for a writer to use that schema when writing a record, although I think we do want to support that... something like a "Use Schema in Record" option as a choice in the 'Schema Access Strategy' of writers. For now, when a processor uses a reader and a writer, and you also want to read and write with the same schema, then you would still have to define the same schema for the writer to use even if you had a CSV reader that inferred the schema from the headers. There are some processors that only use a reader, like PutDabaseRecord, where using the CSV header would still be helpful. There are also a lot of cases where you where you would write with a different schema then you read with, so using the CSV header for reading is still helpful in those cases too. Hopefully I am making sense and not confusing things more. On Fri, May 19, 2017 at 1:27 PM, Joe Gresock wrote: > Matt, > > Great response, this does help explain a lot. Reading through your post > made me realize I didn't understand the AvroSchemaRegistry. I had been > thinking it was something that nifi processors populated, but I re-read its > usage description and it does indeed say to use dynamic properties for the > schema name / value. In that case, I can definitely see how this is not > dynamic in the sense of inferring any schemas on the flow. It makes me > wonder if there would be a way to populate the schema registry from flow > files. When I first glanced at the processors, I had assumed that when the > schema was inferred from the CSV headers, it would create an entry in the > AvroSchemaRegistry, provided you filled in the correct properties. Clearly > this is not how it works. > > Just so I understand, does your first paragraph mean that even if you use > the CSV headers to determine the schema, you still can't use the *Record > processors unless you manually register a matching schema in the schema > registry, or otherwise somehow set the schema in an attribute? In this > case, it almost seems like inferring the schema from the CSV headers serves > no purpose, and I don't see how NIFI-3921 would alleviate that (it only > appears to address avro flow files with embedded schema). > > Based on this understanding, I was able to successfully get the following > flow working: > InferAvroSchema -> QueryRecord. > > QueryRecord uses CSVReader with "Use Schema Text Property" and Schema Text > set to ${inferred.avro.schema} (which is populated by the InferAvroSchema > processor). It also uses JsonRecordSetWriter with a similar > configuration. I could attach a template, but I don't know the best way to > do that on the listserve. > > Joe > > On Fri, May 19, 2017 at 4:59 PM, Matt Burgess wrote: > >> Joe, >> >> Using the CSV Headers to determine the schema is currently the only >> "dynamic" schema strategy, so it will be tricky to use with the other >> Readers/Writers and associated processors (which require an explicit >> schema). This should be alleviated with NIFI-3291 [1]. For this first >> release, I believe the approach would be to identify the various >> schemas for your incoming/outgoing data, create a Schema Registry with >> all of them, then the various Record Readers/Writers using those. >> >> There were some issues during development related to using the >> incoming vs outgoing schema for various record operations, if >> QueryRecord seems to be using the output schema for querying then it >> is likely a bug. However in this case it might just be that you need >> an explicit schema for your Writer that matches the input schema >> (which is inferred from the CSV header). The CSV Header inference >> currently assumes all fields are Strings, so a nominal schema would >> have the same number of fields as columns, each with type String. If >> you don't know the number of columns and/or the column names are >> dynamic per CSV file, I believe we have a gap here (for now). >> >> I thought of maybe having a InferRecordSchema processor that would >> fill in the avro.text attribute for use in various downstream record >> readers/writers, but inferring schemas in general is not a trivial >> task. An easier interim solution might be to have an >> AddSchemaAsAttribute processor, which takes a Reader to parse the >> records and determine the schema (whether dynamic or static), and set >> the avro.text attribute on the original incoming flow file, then >> transfer the original flow file. This would require two reads, one by >> AddSchemaAsAttribute and one by the downstream record processor, but >> it should be fairly easy to implement. Then again, since new features >> would go into 1.3.0, hopefully NIFI-3921 will be implemented by then, >> rendering all this moot :) >> >> Regards, >> Matt >> >> [1] https://issues.apache.org/jira/browse/NIFI-3921 >> >> On Fri, May 19, 2017 at 12:25 PM, Joe Gresock wrote: >> > I've tried a couple different configurations of CSVReader / >> > JsonRecordSetWriter with the QueryRecord processor, and I don't think I >> > quite have the usage down yet. >> > >> > Can someone give a basic example configuration in the following 2 >> > scenarios? I followed most of Matt Burgess's response to the post titled >> > "How to use ConvertRecord Processor", but I don't think it tells the >> whole >> > story. >> > >> > 1) QueryRecord, converting CSV to JSON, using only the CSV headers to >> > determine the schema. (I tried selecting Use String Fields from Header >> in >> > CSVReader, but the processor really seems to want to use the >> > JsonRecordSetWriter to determine the schema) >> > >> > 2) QueryRecord, converting CSV to JSON, using a cached avro schema. I >> > imagine I need to use InferAvroSchema here, but I'm not sure how to cache >> > it in the AvroSchemaRegistry. Also not quite sure how to configure the >> > properties of each controller service in this case. >> > >> > Any help would be appreciated. >> > >> > Joe >> > >> > -- >> > I know what it is to be in need, and I know what it is to have plenty. I >> > have learned the secret of being content in any and every situation, >> > whether well fed or hungry, whether living in plenty or in want. I can >> do >> > all this through him who gives me strength. *-Philippians 4:12-13* >> > > > > -- > I know what it is to be in need, and I know what it is to have plenty. I > have learned the secret of being content in any and every situation, > whether well fed or hungry, whether living in plenty or in want. I can do > all this through him who gives me strength. *-Philippians 4:12-13*