nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arturo Michel <Arturo.Mic...@leotech.com.sg>
Subject Hadoop Sequence File Processor changes the key.
Date Fri, 13 May 2016 08:49:35 GMT
I am using the createHadoopSequenceFile processor to create a sequence file from incoming data
to time stamp the data, using the current time as the key and the data as the value of the
sequence file.


I change the file name attribute (momentarily) to ${now()} as to get a sequence file where
the key is the time and the content is the data. However the processor adds the .sf suffix
which makes it all the way to the key.


I end up with the following structure [40668712567.sf | [data bytes]]


I understand that the file is written as filename.sf but shouldn't the key omit the .sf suffix
and only be the file name?


Looking at Processor code in


<https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/CreateHadoopSequenceFile.java>https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/CreateHadoopSequenceFile.java


155     final String fileName = flowFile.getAttribute(CoreAttributes.FILENAME.key()) + ".sf";
156     flowFile = session.putAttribute(flowFile, CoreAttributes.FILENAME.key(), fileName);
157        try {
158            flowFile = sequenceFileWriter.writeSequenceFile(flowFile, session, getConfiguration(),
compressionType);
159            session.transfer(flowFile, RELATIONSHIP_SUCCESS);
160            getLogger().info("Transferred flowfile {} to {}", new Object[]{flowFile, RELATIONSHIP_SUCCESS});
161        } catch (ProcessException e) {
162            getLogger().error("Failed to create Sequence File. Transferring {} to 'failure'",
new Object[]{flowFile}, e);
163            session.transfer(flowFile, RELATIONSHIP_FAILURE);
164        }



The file name is changed before passing the flow file to the writer. The default sequence
writer (and I think also the others) use the file name as received to write the key.


https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/SequenceFileWriterImpl.java


117 String key = flowFile.getAttribute(CoreAttributes.FILENAME.key());

118 writer.append(new Text(key), inStreamWritable);


If there is a better way of accomplishing this?



Best Regards.










This email is intended only for the individual or entity to which it is addressed and may
contain information that is private, restricted, confidential or secret and exempt from disclosure
under applicable law.
If the reader of this disclaimer is not the intended recipient, you are hereby notified that
any dissemination, distribution or copying of this document is strictly prohibited. If you
received this in error, please notify the sender and delete it immediately after reading this
disclaimer.
Thank you.




Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message