nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arturo Michel <Arturo.Mic...@leotech.com.sg>
Subject Re: Hadoop Sequence File Processor changes the key.
Date Thu, 19 May 2016 14:27:21 GMT
HI Bryan,

Thanks for the response.

I would actually advocate to implement both changes, but the ".sf" suffix in the filename
has a workaround.

The ".sf" suffix can be changed by changing the filename using the UpdateAttribute processor
after creation.

As for the key, there is no way to manipulate it after the file has been created (this is
expected). The key value, however, should be independent of filename attribute.

Your proposed solutions seems the best way of achieving it.


Best Regards.
Arturo

________________________________________
From: Bryan Bende <bbende@gmail.com>
Sent: 18 May 2016 02:51
To: dev@nifi.apache.org
Subject: Re: Hadoop Sequence File Processor changes the key.

Hi Arturo,

Sorry for the delayed response, and thanks for pointing this out.

I don't have that much experience using sequence files, but assuming the
".sf" suffix has no meaning besides aesthetics, then it seems like there
could be two possible solutions...

One would be to not force the ".sf" suffix to be added to filename, and if
someone wants that suffix then they can set the filename using
UpdateAttribute.

The other option would be to not use filename as the key... we could have
another property like "Key Attribute" and the value would be the name of
the attribute to use as the key. This way you can still set filename to end
in ".sf" and the key can be something else.

I lean towards the second approach, what do you think?

-Bryan


On Fri, May 13, 2016 at 4:49 AM, Arturo Michel <Arturo.Michel@leotech.com.sg
> wrote:

> I am using the createHadoopSequenceFile processor to create a sequence
> file from incoming data to time stamp the data, using the current time as
> the key and the data as the value of the sequence file.
>
>
> I change the file name attribute (momentarily) to ${now()} as to get a
> sequence file where the key is the time and the content is the data.
> However the processor adds the .sf suffix which makes it all the way to the
> key.
>
>
> I end up with the following structure [40668712567.sf | [data bytes]]
>
>
> I understand that the file is written as filename.sf but shouldn't the key
> omit the .sf suffix and only be the file name?
>
>
> Looking at Processor code in
>
>
> <
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/CreateHadoopSequenceFile.java
> >
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/CreateHadoopSequenceFile.java
>
>
> 155     final String fileName =
> flowFile.getAttribute(CoreAttributes.FILENAME.key()) + ".sf";
> 156     flowFile = session.putAttribute(flowFile,
> CoreAttributes.FILENAME.key(), fileName);
> 157        try {
> 158            flowFile = sequenceFileWriter.writeSequenceFile(flowFile,
> session, getConfiguration(), compressionType);
> 159            session.transfer(flowFile, RELATIONSHIP_SUCCESS);
> 160            getLogger().info("Transferred flowfile {} to {}", new
> Object[]{flowFile, RELATIONSHIP_SUCCESS});
> 161        } catch (ProcessException e) {
> 162            getLogger().error("Failed to create Sequence File.
> Transferring {} to 'failure'", new Object[]{flowFile}, e);
> 163            session.transfer(flowFile, RELATIONSHIP_FAILURE);
> 164        }
>
>
>
> The file name is changed before passing the flow file to the writer. The
> default sequence writer (and I think also the others) use the file name as
> received to write the key.
>
>
>
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/SequenceFileWriterImpl.java
>
>
> 117 String key = flowFile.getAttribute(CoreAttributes.FILENAME.key());
>
> 118 writer.append(new Text(key), inStreamWritable);
>
>
> If there is a better way of accomplishing this?
>
>
>
> Best Regards.
>
>
>
>
>
>
>
>
>
>
> This email is intended only for the individual or entity to which it is
> addressed and may contain information that is private, restricted,
> confidential or secret and exempt from disclosure under applicable law.
> If the reader of this disclaimer is not the intended recipient, you are
> hereby notified that any dissemination, distribution or copying of this
> document is strictly prohibited. If you received this in error, please
> notify the sender and delete it immediately after reading this disclaimer.
> Thank you.
>
>
>
>



This email is intended only for the individual or entity to which it is addressed and may
contain information that is private, restricted, confidential or secret and exempt from disclosure
under applicable law.
If the reader of this disclaimer is not the intended recipient, you are hereby notified that
any dissemination, distribution or copying of this document is strictly prohibited. If you
received this in error, please notify the sender and delete it immediately after reading this
disclaimer.
Thank you.




Mime
View raw message