maven-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörg Hohwiller (JIRA) <j...@apache.org>
Subject [jira] [Commented] (MRESOLVER-90) HTML content in POM: Maven should validate content before storing in local repo
Date Thu, 01 Aug 2019 07:19:00 GMT

    [ https://issues.apache.org/jira/browse/MRESOLVER-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897832#comment-16897832
] 

Jörg Hohwiller commented on MRESOLVER-90:
-----------------------------------------

> > You could change the default so checksums are validated by default


 > I tried, it was pulled back for compat reasons. I will retry for 3.7.0.

Awesome. Sounds great. Fingers crossed for 3.7.0.

> > You could first download the checksums. If the downloaded checksum is containing
HTML it is not a checksum and any further download for that artifact could already be aborted
with an error.


 > What if the checksum file contains just {{123}} or something else, but not HTML?

Well, either you do a specific validation for checksums that ignores leading and trailing
whitespaces and otherwise only accepts an alphanumeric word, or you be pragmatic and do not
care about the rest (see next point).

> > You could try to detect if the content is HTML (what is quite easy). Assuming the
type is not "html" or "xhtml" you could consider it as invalid


> Content type or sniffing?

Sniffing. Content types are the same problem like HTTP status codes with form login. In an
ideal world they are reliable and correct. However, Firefox still insists of showing the raw
content of HTML files or SVGs if content type is not perfectly right. This is correct from
the specification and an academic point of view. However, it is a pain for end-users. Ever
tried to place SVGs in a github wiki? It would be much smarter of Firefox to show the content
properly but raise a warning icon somewhere to still inform the makers that they are doing
something wrong.

> > You could at least add a validation for pom files. We know that POM files are XML
and we even have a parser that can validate a POM. Therefore for POMs we could reject entirely
invalid content before putting it persitenty into local repo


 > The POMs are already parsed by the model builder/parser and this would cause duplicate
proccess tasks which will impact performance.

Of course it would be tricky to do it such that it is not parsed twice but it is still doable.
Anyhow it might already be efficient to scan the first 512 bytes and check that the root tag
matches with just a string lookahead.

> Please look at {{org.eclipse.aether.connector.basic.BasicRepositoryConnector.get(Collection<?
extends ArtifactDownload>, Collection<? extends MetadataDownload>)}} as well as the
{{org.eclipse.aether.connector.basic.BasicRepositoryConnector.GetTaskRunner.fetchChecksum(URI,
File)}}.
 > This is a starting point to improve things.

Thanks for pointing this out. I will have a look.

> HTML content in POM: Maven should validate content before storing in local repo
> -------------------------------------------------------------------------------
>
>                 Key: MRESOLVER-90
>                 URL: https://issues.apache.org/jira/browse/MRESOLVER-90
>             Project: Maven Resolver
>          Issue Type: New Feature
>    Affects Versions: 1.4.0
>         Environment: both with maven 3.6.0 in CMD or in Eclipse 4.9.0
>            Reporter: Jörg Hohwiller
>            Priority: Major
>
> For some odd reasons somethimes errors just happen and a maven repo delivers an HTML
error or login page for a request of a POM or JAR file. It seems as if the status code is
valid then Maven (might be anything under the hood, maybe even ether?) is saving the result
without any sanity check or validation.
> Therefore I frequently end up with "POM" or "JAR" files in my local repo that are no
XML but HTML nonsens.
>  
> Example:
> {code:java}
> <!--
>    DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS HEADER.
>  
>     Copyright (c) 2007 Sun Microsystems Inc. All Rights Reserved
>  
>     The contents of this file are subject to the terms
>     of the Common Development and Distribution License
>     (the License). You may not use this file except in
>     compliance with the License.
>     You can obtain a copy of the License at
>     https://opensso.dev.java.net/public/CDDLv1.0.html or
>     opensso/legal/CDDLv1.0.txt
>     See the License for the specific language governing
>     permission and limitations under the License.
>     When distributing Covered Code, include this CDDL
>     Header Notice in each file and include the License file
>     at opensso/legal/CDDLv1.0.txt.
>     If applicable, add the following below the CDDL Header,
>     with the fields enclosed by brackets [] replaced by
>     your own identifying information:
>     "Portions Copyrighted [year] [name of copyright owner]"
>     $Id: index.html,v 1.2 2008/06/25 05:48:51 qcheng Exp $
> -->
> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
> <html>
> <head>
> <title>Please Wait While Redirecting to Login page</title>
> <script language="JavaScript"> <!--
> function redirectToAuth() {
>     var params = getQueryParameters();
>     var url = 'UI/Login';
>     if (params != '') {
>         url += params;
>     }
>     top.location.replace(url);
> }
> function getQueryParameters() {
>     var loc = '' + location;
>     var idx = loc.indexOf('?');
>     if (idx != -1) {
>         return loc.substring(idx);
>     } else {
>         return '';
>     }
> }
> //-->
> </script>
> </head>
> <body bgcolor="#FFFFFF" onLoad="redirectToAuth();">
> </body>
> </html>
> {code}
> I would expect maven to verify the content before officially placing it in the correct
location inside the local maven repository on my disc.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Mime
View raw message