nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Huagen peng <huagen.p...@gmail.com>
Subject Re: Wildcard character in the Command Argument field of the ExecuteStreamCommand processor
Date Tue, 31 May 2016 19:08:55 GMT
Thank you for your suggestion, Andy and Lee.

I am aware of the flow using ListFile-FetchFile-HashContent. I didn’t go for that route
because the ListFile processor does not allow upstream processor. I have an upstream processor,
from which I know the directory I want to work with.  I end up to passing the directory name
into the ExecuteStreamCommand processor to get ALL the files under the directory. After that
I use SplitText and ExtractText to filter the files with the desired file extension, and then
I use FetchFile and HashContent to finish what I want to do.

If ListFile allows upstream input, it would have make my data flow much easier.  The same
goes for the ListSFTP processor.

Huagen

> 在 2016年5月31日,下午2:56,Lee Laim <lee.laim@gmail.com> 写道:
> 
> Huagen,
> 
> I had a similar workflow and eventually replaced ExecuteStreamCommand(md5sum) with HashContent.
> 
> Using  ListFile->FetchFile->HashContent, the resultant hash is placed into the
flowfile under the attribute ${hash.value}.
> This processor offers ~40 algorithms to choose from, including md5.   Compared to the
ExecuteStreamCommand, the HashContent processor offers a bit more in error-handling and lineage
traceability in this specific case.  
> 
> Thanks,
> -Lee
> 
> 
> On Tue, May 31, 2016 at 11:24 AM, Andy LoPresto <alopresto@apache.org <mailto:alopresto@apache.org>>
wrote:
> Huagen,
> 
> The ExecuteStreamCommand is used to run a command against the contents of an incoming
flowfile. For example, you could have a ListFile processor listing all .gz files in the directory
and passing them to the ExecuteStreamCommand processor to generate the MD5 hash of each. In
this case, you would not need a wildcard character in the command. 
> 
> The configuration for the processors is as follows:
> 
> ListFile:
> 	-Input directory: <the directory where the files are located>
> 	-File Filter: [^\.]\.gz
> 
> ExecuteStreamCommand:
> 	-Command arguments: ${filename}
> 	-Command path: md5
> 	-Working Directory: <the directory where the files are located>
> 	-Output Destination Attribute: md5hash
> 
> Notes:
> 	-I am using “md5” rather than “md5sum” as I am on Mac OS X. 
> 	-You could use the “-n” flag for “md5” to suppress extraneous output
> 	-You could use “${absolute.path}/${filename}” as the command arguments, in which
case you would not need to set the working directory
>  
> Andy LoPresto
> alopresto@apache.org <mailto:alopresto@apache.org>
> alopresto.apache@gmail.com <mailto:alopresto.apache@gmail.com>
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> 
>> On May 31, 2016, at 7:02 AM, Huagen peng <huagen.peng@gmail.com <mailto:huagen.peng@gmail.com>>
wrote:
>> 
>> Hi, I would like to run a md5sum command on all the *.gz files under a certain directory.
 However, I keep getting this error:
>> md5sum: stat '/tmp/transfer/16-05-22_00/*.gz': No such file or directory
>> 
>> I tried quoting the * wild character, adding a . dot or / in front with no avail.
 Can I do something like this with the ExecuteStreamCommand processor?
>> 
>> Thanks.
> 
> 


Mime
View raw message