nifi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michal Klempa (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (NIFI-1562) ExecuteStreamCommand and ExecuteProcess do not support empty command line arguments
Date Thu, 25 Feb 2016 07:36:18 GMT

     [ https://issues.apache.org/jira/browse/NIFI-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michal Klempa updated NIFI-1562:
--------------------------------
    Description: 
Argument splitting is cluttered with trimming the whitespaces around the whole argument list
and also for each individual argument.
This causes wrong behavior when DataFlow Manager needs to put empty string as an argument
for command using ExecuteStreamCommand and ExecuteProcess.

Lets start by what DataFlow Manager needs to achieve (steps to reproduce):
1. Create a file "test.tsv" with *TAB* separated content:
{code}
one	two	three
this	is	one	string
{code}
2. Put GetFile Prrocessor to obtain this file into DataFlow
3. Connect GetFile to ExecuteStreamCommand.
4. ExecuteStreamCommand configuration: 
 - Command Path: cut
 - Command Arguments: {code}-f;1,2,3,4;--output-delimiter;{code}
 - auto terminate: original
5. Put LogAttribute (Log Payload: true, autoterminate: success) and connect ExecuteStreamCommand
to LogAttribute to see the output.
6. Run this Flow.

Expected output:
{code}
onetwothree
thisisonestring
{code}
As the --output-delimiter argument to cut command is empty string (notice the last semicolon
in argument list), cut command effectively joins columns.
This output can be obtained by issuing this command from within bash:
{code}
$ cut -f 1,2,3,4  --output-delimiter '' test.csv
{code}
Those are apostrophes (to tell bash it is an empty argument).

Actual output:
ExecuteStreamCommand informs Bulletin of cut command error:
{code}
06:14:27 UTC
ERROR
fb12bb69-37e0-4e23-927c-a8aba40f360d

ExecuteStreamCommand[id=fb12bb69-37e0-4e23-927c-a8aba40f360d] Transferring flow file StandardFlowFileRecord[uuid=d94c9e62-1005-4a2d-815d-bdb4c02ebd85,claim=StandardContentClaim
[resourceClaim=StandardResourceClaim[id=1456380578601-1, container=default, section=1], offset=231,
length=0],offset=0,name=test.tsv,size=0] to output stream. Executable command cut ended in
an error: cut: option '--output-delimiter' requires an argument
Try 'cut --help' for more information.
{code}

This is due {{org.apache.nifi.processors.standard.util.ArgumentUtils}}:
1. Line 41: unwanted string trimming - imagine we have used {{' '}} (spacebar) as argument
separator in previous example, then property would look like this: Command Arguments:
{code}
"-f 1,2,3,4 --output-delimiter "
{code}
(there is a space at the end of the string - the last separator as it was with semicolon).
Then, trimming on this line, would ruin our last argument even before we come to splitting
the argument string to list.
2. Line 52: if our output delimiter would look like {{" = "}} (space equals space), for example
to create some kind of .ini file, this trimming would kill our attempts by providing the cut
command only the {{"="}} as argument.
3. Line 53: if our attempt is to provide cut command with empty string as argument (to join
columns), we are neglected by this line.
There is a also JUnit test {{org.apache.nifi.processors.standard.TestExecuteProcess:testSplitArgs}}
which just tests this wrong behavior.
4. Lines 69, 71- trimming once again.

And as I am trying to fix this bug, I do see that there is also obscure QUOTE system, which,
is not for quoting the delimiter character (which would otherwise be treated as a delimiter),
but QUOTES are remove also when they do not enclose the delimiter. This quoting should be
rethinked and documented. Lets fix at least this first bug reported here.

  was:
Argument splitting is cluttered with trimming the whitespaces around the whole argument list
and also for each individual argument.
This causes wrong behavior when DataFlow Manager needs to put empty string as an argument
for command using ExecuteStreamCommand and ExecuteProcess.

Lets start by what DataFlow Manager needs to achieve (steps to reproduce):
1. Create a file "test.tsv" with *TAB* separated content:
{code}
one	two	three
this	is	one	string
{code}
2. Put GetFile Prrocessor to obtain this file into DataFlow
3. Connect GetFile to ExecuteStreamCommand.
4. ExecuteStreamCommand configuration: 
 - Command Path: cut
 - Command Arguments: {code}-f;1,2,3,4;--output-delimiter;{code}
 - auto terminate: original
5. Put LogAttribute (Log Payload: true, autoterminate: success) and connect ExecuteStreamCommand
to LogAttribute to see the output.
6. Run this Flow.

Expected output:
{code}
onetwothree
thisisonestring
{code}
As the --output-delimiter argument to cut command is empty string (notice the last semicolon
in argument list), cut command effectively joins columns.
This output can be obtained by issuing this command from within bash:
{code}
$ cut -f 1,2,3,4  --output-delimiter '' test.csv
{code}
Those are apostrophes (to tell bash it is an empty argument).

Actual output:
ExecuteStreamCommand informs Bulletin of cut command error:
{code}
06:14:27 UTC
ERROR
fb12bb69-37e0-4e23-927c-a8aba40f360d

ExecuteStreamCommand[id=fb12bb69-37e0-4e23-927c-a8aba40f360d] Transferring flow file StandardFlowFileRecord[uuid=d94c9e62-1005-4a2d-815d-bdb4c02ebd85,claim=StandardContentClaim
[resourceClaim=StandardResourceClaim[id=1456380578601-1, container=default, section=1], offset=231,
length=0],offset=0,name=test.tsv,size=0] to output stream. Executable command cut ended in
an error: cut: option '--output-delimiter' requires an argument
Try 'cut --help' for more information.
{code}

This is due {{org.apache.nifi.processors.standard.util.ArgumentUtils}}:
1. Line 41: unwanted string trimming - imagine we have used {{' '}} (spacebar) as argument
separator in previous example, then property would look like this: Command Arguments:
{code}
"-f 1,2,3,4 --output-delimiter "
{code}
(there is a space at the end of the string - the last separator as it was with semicolon).
Then, trimming on this line, would ruin our last argument even before we come to splitting
the argument string to list.
2. Line 52: if our output delimiter would look like {{" = "}} (space equals space), for example
to create some kind of .ini file, this trimming would kill our attempts by providing the cut
command only the {{"="}} as argument.
3. Line 53: if our attempt is to provide cut command with empty string as argument (to join
columns), we are neglected by this line.
There is a also JUnit test {{org.apache.nifi.processors.standard.TestExecuteProcess:testSplitArgs}}
which just tests this wrong behavior.


> ExecuteStreamCommand and ExecuteProcess do not support empty command line arguments
> -----------------------------------------------------------------------------------
>
>                 Key: NIFI-1562
>                 URL: https://issues.apache.org/jira/browse/NIFI-1562
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>    Affects Versions: 0.5.0, 0.4.1
>            Reporter: Michal Klempa
>
> Argument splitting is cluttered with trimming the whitespaces around the whole argument
list and also for each individual argument.
> This causes wrong behavior when DataFlow Manager needs to put empty string as an argument
for command using ExecuteStreamCommand and ExecuteProcess.
> Lets start by what DataFlow Manager needs to achieve (steps to reproduce):
> 1. Create a file "test.tsv" with *TAB* separated content:
> {code}
> one	two	three
> this	is	one	string
> {code}
> 2. Put GetFile Prrocessor to obtain this file into DataFlow
> 3. Connect GetFile to ExecuteStreamCommand.
> 4. ExecuteStreamCommand configuration: 
>  - Command Path: cut
>  - Command Arguments: {code}-f;1,2,3,4;--output-delimiter;{code}
>  - auto terminate: original
> 5. Put LogAttribute (Log Payload: true, autoterminate: success) and connect ExecuteStreamCommand
to LogAttribute to see the output.
> 6. Run this Flow.
> Expected output:
> {code}
> onetwothree
> thisisonestring
> {code}
> As the --output-delimiter argument to cut command is empty string (notice the last semicolon
in argument list), cut command effectively joins columns.
> This output can be obtained by issuing this command from within bash:
> {code}
> $ cut -f 1,2,3,4  --output-delimiter '' test.csv
> {code}
> Those are apostrophes (to tell bash it is an empty argument).
> Actual output:
> ExecuteStreamCommand informs Bulletin of cut command error:
> {code}
> 06:14:27 UTC
> ERROR
> fb12bb69-37e0-4e23-927c-a8aba40f360d
> ExecuteStreamCommand[id=fb12bb69-37e0-4e23-927c-a8aba40f360d] Transferring flow file
StandardFlowFileRecord[uuid=d94c9e62-1005-4a2d-815d-bdb4c02ebd85,claim=StandardContentClaim
[resourceClaim=StandardResourceClaim[id=1456380578601-1, container=default, section=1], offset=231,
length=0],offset=0,name=test.tsv,size=0] to output stream. Executable command cut ended in
an error: cut: option '--output-delimiter' requires an argument
> Try 'cut --help' for more information.
> {code}
> This is due {{org.apache.nifi.processors.standard.util.ArgumentUtils}}:
> 1. Line 41: unwanted string trimming - imagine we have used {{' '}} (spacebar) as argument
separator in previous example, then property would look like this: Command Arguments:
> {code}
> "-f 1,2,3,4 --output-delimiter "
> {code}
> (there is a space at the end of the string - the last separator as it was with semicolon).
Then, trimming on this line, would ruin our last argument even before we come to splitting
the argument string to list.
> 2. Line 52: if our output delimiter would look like {{" = "}} (space equals space), for
example to create some kind of .ini file, this trimming would kill our attempts by providing
the cut command only the {{"="}} as argument.
> 3. Line 53: if our attempt is to provide cut command with empty string as argument (to
join columns), we are neglected by this line.
> There is a also JUnit test {{org.apache.nifi.processors.standard.TestExecuteProcess:testSplitArgs}}
which just tests this wrong behavior.
> 4. Lines 69, 71- trimming once again.
> And as I am trying to fix this bug, I do see that there is also obscure QUOTE system,
which, is not for quoting the delimiter character (which would otherwise be treated as a delimiter),
but QUOTES are remove also when they do not enclose the delimiter. This quoting should be
rethinked and documented. Lets fix at least this first bug reported here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message