From issues-return-152663-archive-asf-public=cust-asf.ponee.io@maven.apache.org Thu Aug 1 07:19:03 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id D6A7A180644 for ; Thu, 1 Aug 2019 09:19:02 +0200 (CEST) Received: (qmail 77277 invoked by uid 500); 1 Aug 2019 07:19:02 -0000 Mailing-List: contact issues-help@maven.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@maven.apache.org Delivered-To: mailing list issues@maven.apache.org Received: (qmail 77262 invoked by uid 99); 1 Aug 2019 07:19:01 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Aug 2019 07:19:01 +0000 Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id AF525E2F00 for ; Thu, 1 Aug 2019 07:19:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 4A06F2663E for ; Thu, 1 Aug 2019 07:19:00 +0000 (UTC) Date: Thu, 1 Aug 2019 07:19:00 +0000 (UTC) From: =?utf-8?Q?J=C3=B6rg_Hohwiller_=28JIRA=29?= To: issues@maven.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (MRESOLVER-90) HTML content in POM: Maven should validate content before storing in local repo MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MRESOLVER-90?page=3Dcom.atlassi= an.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D16= 897832#comment-16897832 ]=20 J=C3=B6rg Hohwiller commented on MRESOLVER-90: ----------------------------------------- > > You could change the default so checksums are validated by default > I tried, it was pulled back for compat reasons. I will retry for 3.7.0. Awesome. Sounds great. Fingers crossed for 3.7.0. > > You could first download the checksums. If the downloaded checksum is c= ontaining HTML it is not a checksum and any further download for that artif= act could already be aborted with an error. > What if the checksum file contains just {{123}} or something else, but n= ot HTML? Well, either you do a specific validation for checksums that ignores leadin= g and trailing whitespaces and otherwise only accepts an alphanumeric word,= or you be pragmatic and do not care about the rest (see next point). > > You could try to detect if the content is HTML (what is quite easy). As= suming the type is not "html" or "xhtml" you could consider it as invalid > Content type or sniffing? Sniffing. Content types are the same problem like HTTP status codes with fo= rm login. In an ideal world they are reliable and correct. However, Firefox= still insists of showing the raw content of HTML files or SVGs if content = type is not perfectly right. This is correct from the specification and an = academic point of view. However, it is a pain for end-users. Ever tried to = place SVGs in a github wiki? It would be much smarter of Firefox to show th= e content properly but raise a warning icon somewhere to still inform the m= akers that they are doing something wrong. > > You could at least add a validation for pom files. We know that POM fil= es are XML and we even have a parser that can validate a POM. Therefore for= POMs we could reject entirely invalid content before putting it persitenty= into local repo > The POMs are already parsed by the model builder/parser and this would c= ause duplicate proccess tasks which will impact performance. Of course it would be tricky to do it such that it is not parsed twice but = it is still doable. Anyhow it might already be efficient to scan the first = 512 bytes and check that the root tag matches with just a string lookahead. > Please look at {{org.eclipse.aether.connector.basic.BasicRepositoryConnec= tor.get(Collection, Collection)}} as well as the {{org.eclipse.aether.connector.basic.BasicRep= ositoryConnector.GetTaskRunner.fetchChecksum(URI, File)}}. > This is a starting point to improve things. Thanks for pointing this out. I will have a look. > HTML content in POM: Maven should validate content before storing in loca= l repo > -------------------------------------------------------------------------= ------ > > Key: MRESOLVER-90 > URL: https://issues.apache.org/jira/browse/MRESOLVER-90 > Project: Maven Resolver > Issue Type: New Feature > Affects Versions: 1.4.0 > Environment: both with maven 3.6.0 in CMD or in Eclipse 4.9.0 > Reporter: J=C3=B6rg Hohwiller > Priority: Major > > For some odd reasons somethimes errors just happen and a maven repo deliv= ers an HTML error or login page for a request of a POM or JAR file. It seem= s as if the status code is valid then Maven (might be anything under the ho= od, maybe even ether?) is saving the result without any sanity check or val= idation. > Therefore I frequently end up with "POM" or "JAR" files in my local repo = that are no XML but HTML nonsens. > =C2=A0 > Example: > {code:java} > > > > > Please Wait While Redirecting to Login page > > > > > > {code} > I would expect maven to verify the content before officially placing it i= n the correct location inside the local maven repository on my disc. -- This message was sent by Atlassian JIRA (v7.6.14#76016)