community-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebb (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (COMDEV-156) parseprojects.py: Calculation of projectJsonFilename is flawed
Date Tue, 15 Sep 2015 08:21:45 GMT

     [ https://issues.apache.org/jira/browse/COMDEV-156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sebb updated COMDEV-156:
------------------------
    Description: 
The parseprojects.py script calculates the projectJsonFilename variable from the DOAP homepage
entry.  If the homepage has path components after tlp.apache.org then the last one is used
and appended to the tlp.
This process is not guaranteed to result in a unique file name. For example the entry
<homepage rdf:resource="http://commons.apache.org/beanutils/index.html"/>
is converted to
commons-index.html
It so happens that the other Commons DOAPs don't include the index.html so they are unique,
but this is just chance. There are other ways that this approach can fail as there is no standard
convention for project homepage URLs within a TLP website (nor should one be enforced).

It would be a lot simpler (and more reliable) to use the DOAP name field.
Trim the leading Apache, convert to lower case, remove/replace spaces and sanitiize illegal
filename characters.
[The original code only allowed alphanumeric characters plus '-' and '+'. Everything else
was converted to '_'.

There may still be duplicate names, but that is an issue for the project to resolve as that
is not allowed (and the code can report duplicates).

  was:
The parseprojects.py script calculates the projectJsonFilename variable from the DOAP homepage
entry.  If the homepage has path components after tlp.apache.org then the last one is used
and appended to the tlp.
This process is not guaranteed to result in a unique file name. For example the entry
<homepage rdf:resource="http://commons.apache.org/beanutils/index.html"/>
is converted to
commons-index.html
It so happens that the other Commons DOAPs don't include the index.html so they are unique,
but this is just chance. There are other ways that this approach can fail as there is no standard
convention for project homepage URLs within a TLP website (nor should one be enforced).

It would be a lot simpler (and more reliable) to use the DOAP name field.
Trim the leading Apache, convert to lower case, remove spaces and sanitiize illegal filename
characters.

There may still be duplicate names, but that is an issue for the project to resolve as that
is not allowed (and the code can report duplicates).


> parseprojects.py: Calculation of projectJsonFilename is flawed
> --------------------------------------------------------------
>
>                 Key: COMDEV-156
>                 URL: https://issues.apache.org/jira/browse/COMDEV-156
>             Project: Community Development
>          Issue Type: Bug
>          Components: Projects New
>            Reporter: Sebb
>
> The parseprojects.py script calculates the projectJsonFilename variable from the DOAP
homepage entry.  If the homepage has path components after tlp.apache.org then the last one
is used and appended to the tlp.
> This process is not guaranteed to result in a unique file name. For example the entry
> <homepage rdf:resource="http://commons.apache.org/beanutils/index.html"/>
> is converted to
> commons-index.html
> It so happens that the other Commons DOAPs don't include the index.html so they are unique,
but this is just chance. There are other ways that this approach can fail as there is no standard
convention for project homepage URLs within a TLP website (nor should one be enforced).
> It would be a lot simpler (and more reliable) to use the DOAP name field.
> Trim the leading Apache, convert to lower case, remove/replace spaces and sanitiize illegal
filename characters.
> [The original code only allowed alphanumeric characters plus '-' and '+'. Everything
else was converted to '_'.
> There may still be duplicate names, but that is an issue for the project to resolve as
that is not allowed (and the code can report duplicates).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message