Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id BAE58200828 for ; Fri, 13 May 2016 10:49:51 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id B9E7A16099F; Fri, 13 May 2016 08:49:51 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 0AD901602C0 for ; Fri, 13 May 2016 10:49:50 +0200 (CEST) Received: (qmail 73332 invoked by uid 500); 13 May 2016 08:49:50 -0000 Mailing-List: contact dev-help@nifi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@nifi.apache.org Delivered-To: mailing list dev@nifi.apache.org Received: (qmail 73321 invoked by uid 99); 13 May 2016 08:49:49 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 May 2016 08:49:49 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 7773A1A0593 for ; Fri, 13 May 2016 08:49:49 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.65 X-Spam-Level: ** X-Spam-Status: No, score=2.65 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, RP_MATCHES_RCVD=-0.001, SPF_HELO_PASS=-0.001, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id bldfTQJ4XYa9 for ; Fri, 13 May 2016 08:49:45 +0000 (UTC) Received: from mail.leotech.com.sg (mail.leotech.com.sg [61.8.225.138]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTP id 3444D5F4EE for ; Fri, 13 May 2016 08:49:43 +0000 (UTC) Received: from SGMAIL01.leotech.local (192.168.103.26) by SGMAIL01.leotech.local (192.168.103.26) with Microsoft SMTP Server (TLS) id 15.1.225.42; Fri, 13 May 2016 16:49:35 +0800 Received: from SGMAIL01.leotech.local ([fe80::7c9d:12f5:42e5:1767]) by SGMAIL01.leotech.local ([fe80::7c9d:12f5:42e5:1767%12]) with mapi id 15.01.0225.041; Fri, 13 May 2016 16:49:35 +0800 From: Arturo Michel To: "dev@nifi.apache.org" Subject: Hadoop Sequence File Processor changes the key. Thread-Topic: Hadoop Sequence File Processor changes the key. Thread-Index: AQHRrPPygc7JY1+H4kus93ZFmPQFig== Date: Fri, 13 May 2016 08:49:35 +0000 Message-ID: <2a44f2befa2947439d2bd69b46c92e0f@leotech.com.sg> Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [81.139.43.100] x-gfi-smtp-submission: 1 x-gfi-smtp-hellodomain: SGMAIL01.leotech.local x-gfi-smtp-remoteip: 192.168.103.26 Content-Type: multipart/alternative; boundary="_000_2a44f2befa2947439d2bd69b46c92e0fleotechcomsg_" MIME-Version: 1.0 archived-at: Fri, 13 May 2016 08:49:51 -0000 --_000_2a44f2befa2947439d2bd69b46c92e0fleotechcomsg_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable I am using the createHadoopSequenceFile processor to create a sequence file= from incoming data to time stamp the data, using the current time as the k= ey and the data as the value of the sequence file. I change the file name attribute (momentarily) to ${now()} as to get a sequ= ence file where the key is the time and the content is the data. However th= e processor adds the .sf suffix which makes it all the way to the key. I end up with the following structure [40668712567.sf | [data bytes]] I understand that the file is written as filename.sf but shouldn't the key = omit the .sf suffix and only be the file name? Looking at Processor code in https://github.com/apache/nifi/blob/master/nif= i-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apa= che/nifi/processors/hadoop/CreateHadoopSequenceFile.java 155 final String fileName =3D flowFile.getAttribute(CoreAttributes.FILE= NAME.key()) + ".sf"; 156 flowFile =3D session.putAttribute(flowFile, CoreAttributes.FILENAME= .key(), fileName); 157 try { 158 flowFile =3D sequenceFileWriter.writeSequenceFile(flowFile, = session, getConfiguration(), compressionType); 159 session.transfer(flowFile, RELATIONSHIP_SUCCESS); 160 getLogger().info("Transferred flowfile {} to {}", new Object= []{flowFile, RELATIONSHIP_SUCCESS}); 161 } catch (ProcessException e) { 162 getLogger().error("Failed to create Sequence File. Transferr= ing {} to 'failure'", new Object[]{flowFile}, e); 163 session.transfer(flowFile, RELATIONSHIP_FAILURE); 164 } The file name is changed before passing the flow file to the writer. The de= fault sequence writer (and I think also the others) use the file name as re= ceived to write the key. https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-hadoop-bun= dle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/Se= quenceFileWriterImpl.java 117 String key =3D flowFile.getAttribute(CoreAttributes.FILENAME.key()); 118 writer.append(new Text(key), inStreamWritable); If there is a better way of accomplishing this? Best Regards. This email is intended only for the individual or entity to which it is add= ressed and may contain information that is private, restricted, confidentia= l or secret and exempt from disclosure under applicable law. If the reader of this disclaimer is not the intended recipient, you are her= eby notified that any dissemination, distribution or copying of this docume= nt is strictly prohibited. If you received this in error, please notify the= sender and delete it immediately after reading this disclaimer. Thank you. --_000_2a44f2befa2947439d2bd69b46c92e0fleotechcomsg_--