maven-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Connolly <stephen.alan.conno...@gmail.com>
Subject Re: Source code verification/compliance with Maven?
Date Fri, 10 May 2013 10:21:47 GMT
On 10 May 2013 10:09, Daniel Pocock <daniel@pocock.com.au> wrote:

> On 09/05/13 17:21, Manfred Moser wrote:
> > Correct. The other thing you want to ensure is your acquisition of the
> > jar. With Nexus (and other repo managers probably) you can connect to the
> > https secured version of Central and enforce checksum verification so you
> > can be sure that any component you get into the repo manager is the same
> > as upstream.
> Authenticity of the binary is just one element in the big picture
> though.  The requirement I have involves various factors, some of which
> appear to be addressed by CLM.
>
> - some plugin that is essential to the build process is only available
> in binary format.  The developer has put a time restriction on the
> binary (e.g. it stops working in 2014) as they plan to start charging
> money for it in future, but no user is aware of this restriction until
> it's too late.
>

I can do that in obfuscated code just as easily... doesn't even need to be
that obfuscated...

Personally speaking, though, if some timebomb like that landed on me I
would *never* use anything from that developer again... so using the do on
to others principle I do not support the practice.


> - a user wants to run their own on-site code analysis tool, and as the
> quality of the tool improves, they want to re-run it over all their open
> source code from time to time as well
>

In the JVM world, the best code analysis tools analyse the compiled
bytecode. This will be especially true moving forward on the JVM as it
becomes more polyglot... Clojure produces .class files too you know.

Code analysis tools that process .java files only limit themselves... if
you analyse at the bytecode level you have a simpler parser (don't need to
worry about syntax) and you can support the polyglot ecosystem.


> - a user stresses a particular part of the code that has never been used
> in anger before, and they need to profile and optimize that code with
> some tools that require an original source tree
>

Those sound like crap tools.

* I cannot use them to profile/optimize my production code that has had
debug information stripped
* I cannot use them to profile the Clojure library that I am using
* I cannot use them to profile the Scala library that I am using
* I cannot use them to profile the Kotlin library that I am using
* ...

I am not going to use those tools. ;-)


>
> As I see it, CLM may provide the foundation for addressing these
>

I was tempted to mention the CLM stuff that Sonatype adds, but I would have
felt that I would need to go an research the other vendors in the space and
give them a mention too (I've heard something about ducks sleeping with
frogs, but I have not seen what the offspring look like for example)... but
I decided against as it did not address the full set of requirements you
mentioned up front.


> concerns (e.g. by making sure that all components provide sources), but
> the most prudent users still potentially need a convenient way to repeat
> the build(s) locally on demand in a clean, offline environment.
>
> Does Sonatype offer a public CLM demo of some multi-component open
> source project, e.g. to show how CLM reports on Eclipse and perhaps some
> application server like Apache ServiceMix?
>
> The other thing that caught my attention is the Maven SCM capability,
> and the <scm/> element in pom.xml - is this only used for making tags
> after a build?  Or does any tool use this information to recursively
> check-out and build dependency projects?
>

Keep in mind that it is not only Maven that creates pom.xml files for
upload to central.

There is some validation of the pom.xml files to ensure that there is basic
information, e.g. You must have a value in the <scm> section, etc...

Have a look at org.apache.cassandra:cassandra-all:1.2.4 for example (
http://repo1.maven.org/maven2/org/apache/cassandra/cassandra-all/1.2.4/cassandra-all-1.2.4.pom
)

That has a <scm> section... you might think that checking out that would
give you the build of that .jar? well yes it will... but not without some
digging... no mention of the branch/tag/hash that the code is based off...
no mention that this is actually a project built with ANT... and that the
actual source bundle is
http://repo1.maven.org/maven2/org/apache/cassandra/apache-cassandra/1.2.4/apache-cassandra-1.2.4-src.tar.gz
(Side note: this was the best pom.xml that I could get the C* guys to agree
to publish, and it took a bit of work to help them get their artifacts into
central)

There are people uploading their projects that they build with
ANT/Gradle/Buildr/Rake/SBT/Leiningen... you could even build your software
with a Windows Batch file and generate a pom.xml to explain the
dependencies and then upload that and the bundle of deployed artifacts to
central.

Deployment on Central does not imply built by Maven.... just as built by
Maven does not imply deployment on central...

Central's validations *cannot* ever include validating that the <scm> tag
actually builds the produced artifacts:
* It would take too long
* It would need fuzzy matching, e.g.
    - compare .jar files ignoring file date stamps
    - for META-INF/MANIFEST.MF files ignore the build date in the contents,
    - for generated .properties files that include the build date in the
comments, ignore those differences,
    - for JCE implementations ignore that Central cannot sign as it does
not have the code signing keys
  And that's just .jar files
* It would need an insane collection of build tool chains

That same argument applies to forcing .src.tar.gz or .source-release.zip
artifacts.

The fact is that we just have to *trust blindly* that the information in
the .pom is correct...

Now does that mean you have to stay blind? Nope not at all... I think there
could be value in somebody setting up an index of GAVs that are trusted by
people... that way you could build a web of trust in a GAV's binary
artifacts being "safe"...


> In Stephen's earlier reply, he commented on the difficulty of
> recognizing embedded code (e.g. class files or JARs).  I don't think
> this is as bad as it sounds though: a quick check of the signature of a
> file is enough to determine if it is a PNG or a JAR or something
> unknown.  A recursive scan of sources would probably classify each of
> those artifacts for a different action.  e.g. *.png could be
> whitelisted, while source repositories that keep binary copies of some
> build tools would generate an alert for manual attention.
>

Hey I could Base64 encode the binary as private static final String[] in
the .java source... decode that binary and use a custom classloader to load
the real code, e.g.

public interface SuperWidget {
  WonkaBar doMagic(Chocolate c, OopaLoompaProvider p);
}

public final class SuperWidgetFactory {

  private SuperWidgetFactory() {}

  public static SuperWidget build() {
    return new SuperWidgetImpl();
  }

}

Is the API that I am working to, but obviously I don't want Slugworth to
see the inside of the factory, even if I am making the code available...

So I just have SuperWidgetImpl() do something like

class SuperWidgetImpl implements SuperWidget {

  private final SuperWidget delegate;

  public WonkaBar doMagic(Chocolate c, OopaLoompaProvider p) {
    return delegate.doMagic(c,p);
  }

  private static final byte[] $sauce = {
    0xca, 0xfe, 0xba, 0xbe, ...
  }

  SuperWidgetImpl() {
    ClassLoader c = new
ByteArrayClassLoader(SuperWidgetImpl.class.getClassLoader(), $sauce);
    delegate = (SuperWidget)c.loadClass("Wonka", true).newInstance();
  }

}

Now there is nothing you can do short of looking at the source to find
that, so that demonstrates the first side of the strawman:

1. The absence of binary files in the -sources.jar does not indicate that
-sources are complete

For the second side of the strawman, we have a very simple example:

Lookup tables.

Say the API I am using has a rather complex set of lookup tables, and say I
don't want to include it in the .class file (for one the .class file will
break the size limits) So instead I encode the lookup tables as a binary
resource. Those tables are in src/main/resources, so they will end up in
the -sources.jar... and they are quite big, and I am not compressing the
code .jar so instead I package these tables as a .jar but one that is
compressed so that when embedded in the code .jar the size is minimized...
or maybe I compressed them as .tar.gz, or .bz2, or ...

That should demonstrate the second side of the strawman:

2. The presence of binary files in the -sources.jar does not indicate that
-sources are incomplete.

So we have established that there is no definitive test of complete
sources.... we can find binary files in the -sources.jar and that just
means we have to look at the actual source code to see if it is actually
incomplete... we can find no binary files in the -sources.jar and that just
means that we have to look at the actual source code to see if it is hiding
binaries as static constants in the .class files (or hiding binaries as
"dead code" methods in the .class files via some encoding scheme or other -
think load the bytecode for the method and run it through a function to
pull out the real bytes... not efficient, but you can hide whatever you
want in that... decompile the bytecode and there's the "source")

Earlier I pointed out that having sources is no guarantee of usefulness of
sources... since they can be just generated by a decompiler running on the
bytecode... well there is also no guarantee that complete sources give you
complete sources without the knowledge of how the build works.

You can have a build tool that processes one file and produces resources
that get included in the .jar... so in the strict definition of what is
source code, the one file that gets processed is source, the generated
resources are not, so we don't include the generated resources in the
-sources.jar (in fact may not even include the source file, e.g. see
Modello in plugins)

BUT remember -sources.jar is for IDEs not for building, so when we generate
source code as part of a Maven build, we include the generated source code
in the -sources.jar *because* the whole point of those files is to allow
IDEs to let you debug into methods... so from that purpose it makes
*perfect* sense.

TL;DR

-sources.jar is for IDEs not for building.

The ones that are ment for building are conventionally called
-source-release.zip



>
>
> > This avoid the problem of building from source as you suggest since this
> > is very often extremely difficult (just as the Debian folks or other
> Linux
> > distros that try to rebuild everything from sources.. its a HUGE task).
> I am one of those Debian folks and I do provide all my own packages in a
> manner that enables people to repeat the build, add patches, etc in a
> very uniform manner - although bringing Maven-based projects into Debian
> wasn't the reason for my query in this instance.
>

It would be nice if *everyone* could provide all their own stuff in a
manner that enables people to repeat the build, add patches, etc.

We sadly do not live in that world.


>
>
> > Sonatype CLM includes integration with Nexus, Eclipse IDE and
> > Jenkins/Hudson as well as a backend server to define rules and more.
> >
> > manfred
> >
> > PS: Disclaimer I work with Sonatype on their documentation ..
> >
> >
> >> Well, I know that Sonatype has a product they have been pretty
> aggressive
> >> with called CLM.
> >>
> >> CLM shows both vulnerabilities and license threats -- including
> undefined
> >> licenses...  Perhaps that is what you need?
> >>
> >>
> >> Thanks,
> >>
> >> Roy Lyons
> >>
> >>
> >>
> >>
> >>
> >> On 5/9/13 4:15 AM, "Daniel Pocock" <daniel@pocock.com.au> wrote:
> >>
> >>> Hi,
> >>>
> >>> There is a lot of confusion about the distinction between software that
> >>> is free (like malware in app stores) and software that is really free
> >>> with open source code.
> >>>
> >>> Several people have asked me how they can be sure that a Maven build
> >>> (including all downloaded plugins) only uses genuine open source
> >>> software, and that the binary downloads are identical to the source
> >>> releases.  There are many users that want to build projects from source
> >>> code in clean, non-networked environments.
> >>>
> >>> How can somebody tell Maven to
> >>> a) recursively download source JARs for all plugins and dependencies
> >>> (and their build plugins) and compile them one by one?
> >>> b) stop if any source JAR contains binary artifacts or if a
> >>> dependency/plugin source is not available?
> >>> c) put all downloaded source in some kind of tree where it can be
> tarred
> >>> up, copied onto a DVD and then built by a machine that is offline?
> >>>
> >>> I'm aware of the command "mvn dependency:sources", but this only
> appears
> >>> to fetch the sources on a best effort basis and doesn't appear to
> >>> compile them.
> >>>
> >>> Regards,
> >>>
> >>> Daniel
> >>>
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: users-unsubscribe@maven.apache.org
> >>> For additional commands, e-mail: users-help@maven.apache.org
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe@maven.apache.org
> >> For additional commands, e-mail: users-help@maven.apache.org
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe@maven.apache.org
> > For additional commands, e-mail: users-help@maven.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@maven.apache.org
> For additional commands, e-mail: users-help@maven.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message