Return-Path: X-Original-To: apmail-nifi-commits-archive@minotaur.apache.org Delivered-To: apmail-nifi-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 083601805B for ; Tue, 27 Oct 2015 21:57:26 +0000 (UTC) Received: (qmail 43147 invoked by uid 500); 27 Oct 2015 21:56:27 -0000 Delivered-To: apmail-nifi-commits-archive@nifi.apache.org Received: (qmail 43114 invoked by uid 500); 27 Oct 2015 21:56:27 -0000 Mailing-List: contact commits-help@nifi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@nifi.apache.org Delivered-To: mailing list commits@nifi.apache.org Received: (qmail 43083 invoked by uid 99); 27 Oct 2015 21:56:27 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Oct 2015 21:56:27 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id B210C2C033A for ; Tue, 27 Oct 2015 21:56:27 +0000 (UTC) Date: Tue, 27 Oct 2015 21:56:27 +0000 (UTC) From: "Joseph Percivall (JIRA)" To: commits@nifi.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (NIFI-1077) Allow ConvertCharacterSet to accept expression language MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/NIFI-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Percivall updated NIFI-1077: ----------------------------------- Attachment: NIFI-1077.patch Enabled expression language and added unit tests > Allow ConvertCharacterSet to accept expression language > ------------------------------------------------------- > > Key: NIFI-1077 > URL: https://issues.apache.org/jira/browse/NIFI-1077 > Project: Apache NiFi > Issue Type: Improvement > Reporter: Joseph Percivall > Assignee: Joseph Percivall > Priority: Minor > Attachments: NIFI-1077.patch > > > This issue arose from a user on the mailing list. It demonstrates the need to be able to use expression language to set the incoming (and potentially outgoing) character sets: > I'm looking to process many files into common formats. The source files are coming in various character sets, mime types, and new line terminators. > My thinking for a data flow was along these lines: > GetFile (from many sub directories) -> > ExecuteStreamCommand (file -i) -> > ConvertCharacterSet (from previous command to utf8) -> > ReplaceText (to change any \r\n into \n) -> > PutFile (into a directory structure based on values found in the original file path and filename) > Additional steps would be added for archiving a copy of the original, converting xml files, etc. > Attempting to process these with Nifi leaves me confused as to how to process within the tool. If I want to ConvertCharacterSet, I have to know the input type. I setup a ExecuteStreamCommand to file -i ${absolute.path:append(${filename})} which returned the expected values. I don't see a way to turn these results into input for the processor, which doesn't accept expression language for that field. > I also considered ConvertCSVToAvro as an interim step but notice the same issue. Any suggestions what this dataflow should look like? -- This message was sent by Atlassian JIRA (v6.3.4#6332)