creadur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raphael von der Grün (Jira) <j...@apache.org>
Subject [jira] [Updated] (RAT-265) CLI: Certain wildcard file filters do not work anymore
Date Sat, 23 Nov 2019 21:31:00 GMT

     [ https://issues.apache.org/jira/browse/RAT-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Raphael von der Grün updated RAT-265:
-------------------------------------
    Description: 
Run the following command in the root of the `rat` repo:
{noformat}
java -jar apache-rat-0.14-20191120.132901-66.jar -e "*.txt" -d apache-rat-core/src/test/resources/violations{noformat}
This will give the following output on `stderr`: 
{noformat}
Will skip given exclusion '*.txt' due to java.util.regex.PatternSyntaxException: Dangling
meta character '*' near index 0
*.txt
^
{noformat}
Furthermore, `bad.txt` will NOT be excluded from the license check.

The error that causes this is thrown in [line 132 of `org.apache.rat.Report.java`|#L132]].
The reason is simple: any glob pattern that starts with `*` or `?` is not a valid regex. When
Line 132 throws, the next two lines will also be skipped, so the pattern will not be added
at all.

Unfortunately, a solution to this problem is not so simple. In `v0.12` the `-e` option always
added wildcard filters while `-E` always added regex filters. The documentation still states
the same in the latest `v0.14` snapshot. Beginning with `v0.13` the code tries to add any
exclude rule as three different filters. I believe this approach is inherently flawed.

Firstly, the `new NameFileFilter(exclusion)` is redundant if we also add `new WildcardFileFilter(exclusion)`.
The files matched by the `NameFileFilter` are a subset of those matched by the `WildcardFileFilter`
since any magic character (i.e. `?` or `*`) in `exclusion` also matches itself when used
in a `WildcardFileFilter`.

So let's assume we only register the `WildcardFileFilter` and the `RegexFileFilter`. Even
if we properly add patterns as wildcard filters that are not a valid RegEx, there are still
patterns where we cannot decide what the user's intention was. Consider the pattern `bi.ini`.
Should it be interpreted as a wildcard pattern and match only itself or should it be interpreted
as a regex and also match `bikini` for example?

My recommendation for a quick patch solution would be to go back to the exclusion behavior
of `v0.12`.

Beyond that, the nicest solution IMHO would be support for ignore files with the same semantics
as `.gitignore` (via `-E`) and support for giving extended shell globs via `-e`.

  was:
Run the following command in the root of the `rat` repo:
{noformat}
java -jar apache-rat-0.14-20191120.132901-66.jar -e "*.txt" -d apache-rat-core/src/test/resources/violations/bad.txt{noformat}
This will give the following output on `stderr`: 
{noformat}
Will skip given exclusion '*.txt' due to java.util.regex.PatternSyntaxException: Dangling
meta character '*' near index 0
*.txt
^
{noformat}
Furthermore, `bad.txt` will NOT be excluded from the license check.

The error that causes this is thrown in [line 132 of `org.apache.rat.Report.java`|#L132]].
The reason is simple: any glob pattern that starts with `*` or `?` is not a valid regex. When
Line 132 throws, the next two lines will also be skipped, so the pattern will not be added
at all.

Unfortunately, a solution to this problem is not so simple. In `v0.12` the `-e` option always
added wildcard filters while `-E` always added regex filters. The documentation still states
the same in the latest `v0.14` snapshot. Beginning with `v0.13` the code tries to add any
exclude rule as three different filters. I believe this approach is inherently flawed.

Firstly, the `new NameFileFilter(exclusion)` is redundant if we also add `new WildcardFileFilter(exclusion)`.
The files matched by the `NameFileFilter` are a subset of those matched by the `WildcardFileFilter`
since any magic character (i.e. `?` or `*`) in `exclusion` also matches itself when used
in a `WildcardFileFilter`.

So let's assume we only register the `WildcardFileFilter` and the `RegexFileFilter`. Even
if we properly add patterns as wildcard filters that are not a valid RegEx, there are still
patterns where we cannot decide what the user's intention was. Consider the pattern `bi.ini`.
Should it be interpreted as a wildcard pattern and match only itself or should it be interpreted
as a regex and also match `bikini` for example?

My recommendation for a quick patch solution would be to go back to the exclusion behavior
of `v0.12`.

Beyond that, the nicest solution IMHO would be support for ignore files with the same semantics
as `.gitignore` (via `-E`) and support for giving extended shell globs via `-e`.


> CLI: Certain wildcard file filters do not work anymore
> ------------------------------------------------------
>
>                 Key: RAT-265
>                 URL: https://issues.apache.org/jira/browse/RAT-265
>             Project: Apache Rat
>          Issue Type: Bug
>          Components: cli
>    Affects Versions: 0.13, 0.14
>            Reporter: Raphael von der Grün
>            Priority: Major
>
> Run the following command in the root of the `rat` repo:
> {noformat}
> java -jar apache-rat-0.14-20191120.132901-66.jar -e "*.txt" -d apache-rat-core/src/test/resources/violations{noformat}
> This will give the following output on `stderr`: 
> {noformat}
> Will skip given exclusion '*.txt' due to java.util.regex.PatternSyntaxException: Dangling
meta character '*' near index 0
> *.txt
> ^
> {noformat}
> Furthermore, `bad.txt` will NOT be excluded from the license check.
> The error that causes this is thrown in [line 132 of `org.apache.rat.Report.java`|#L132]].
The reason is simple: any glob pattern that starts with `*` or `?` is not a valid regex. When
Line 132 throws, the next two lines will also be skipped, so the pattern will not be added
at all.
> Unfortunately, a solution to this problem is not so simple. In `v0.12` the `-e` option
always added wildcard filters while `-E` always added regex filters. The documentation still
states the same in the latest `v0.14` snapshot. Beginning with `v0.13` the code tries to add
any exclude rule as three different filters. I believe this approach is inherently flawed.
> Firstly, the `new NameFileFilter(exclusion)` is redundant if we also add `new WildcardFileFilter(exclusion)`.
The files matched by the `NameFileFilter` are a subset of those matched by the `WildcardFileFilter`
since any magic character (i.e. `?` or `*`) in `exclusion` also matches itself when used
in a `WildcardFileFilter`.
> So let's assume we only register the `WildcardFileFilter` and the `RegexFileFilter`.
Even if we properly add patterns as wildcard filters that are not a valid RegEx, there are
still patterns where we cannot decide what the user's intention was. Consider the pattern
`bi.ini`. Should it be interpreted as a wildcard pattern and match only itself or should it
be interpreted as a regex and also match `bikini` for example?
> My recommendation for a quick patch solution would be to go back to the exclusion behavior
of `v0.12`.
> Beyond that, the nicest solution IMHO would be support for ignore files with the same
semantics as `.gitignore` (via `-E`) and support for giving extended shell globs via `-e`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message