pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (PDFBOX-3155) org.apache.pdfbox.util.PDFTextStripper class initialization throws NumberFormatException with recent Verona-enabled Java 9 JVMs
Date Tue, 08 Dec 2015 20:55:11 GMT

    [ https://issues.apache.org/jira/browse/PDFBOX-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047445#comment-15047445
] 

Uwe Schindler edited comment on PDFBOX-3155 at 12/8/15 8:54 PM:
----------------------------------------------------------------

Hi,

bq. Is this "planned" or already decided?

it was planned and the JEP exists since October 2015. The code is merged to main JDK repository
and accepted by community. It is now part of public beta testing, otherwise we would not have
seen this bug. There is no way back :-)

FYI, JIGSAW (the module system of Java 9) is not yet merged and therefore not finally decided,
but we already tested their Jigsaw-pre-pre-builds with Lucene, Solr, Elasticsearch and confirmed
that it works from Lucene 5.4 on.

To your code: I would suggest to simplify the whole thing - two options (you only want to
detect Java 7 minimum (or other way round if its still java 6):

# Use {{java.sepcification.version}} (this property is what you are looking at. It has standardized
format, consisting only of pure digits and colons). Since Java 1.0! :-) java.version is the
version of the installed JDK/JRE package, its version number can be any string. It is NOT
standardized. The specification is also what you are looking at, because it defines what the
API can do. It is standardized. Lucene uses the following code to detect the Java version,
which is unchanged since long time (last updates were just to add Java 9): <https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/util/Constants.java#L56-L85>
(it did not break, because it checks some standardized string format).
# Alternatively just look for a class which only exists in Java 7 with forName or look for
a public method() or similar. This is easy to detect and catch with ClassNotFoundException/MethodNotFound.
Apache Ant is using this to detect java versions, too. The easiest variant is the all famous
method on Throwable class: 

{code:java}
public static final JRE_IS_MINIMUM_JAVA7;
static {
    // this method only exists in Java 7:
    boolean v7 = true;
    try {
      Throwable.class.getMethod("getSuppressed");
    } catch (Exception e) {
      v7 = false;
    }
    JRE_IS_MINIMUM_JAVA7 = v7;
}
{code}


was (Author: thetaphi):
Hi,

bq. Is this "planned" or already decided?

it was planned and the JEP exists since October 2015. The code is merged to main JDK repository
and accepted by community. It is now part of public beta testing, otherwise we would not have
seen this bug. There is no way back :-)

FYI, JIGSAW (the module system of Java 9) is not yet merged and therefore not finally decided,
but we already tested their Jigsaw-pre-pre-builds with Lucene, Solr, Elasticsearch and confirmed
that it works from Lucene 5.4 on.

To your code: I would suggest to simplify the whole thing - two options (you only want to
detect Java 7 minimum (or other way round if its still java 6):

# Use {{java.sepcification.version}} (this property is what you are looking at. It has standardized
format, consisting only of pure digits and colons). Since Java 1.0! :-) java.version is the
version of the installed JDK/JRE package, its version number can be any string. It is NOT
standardized. The specification is also what you are looking at, because it defines what the
API can do. It is standardized. Lucene uses the following code to detect the Java version,
which is unchanged since long time (last updates were just to add Java 9): <https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/util/Constants.java#L56-L85>
(it did not break, because it checks some standardized string format).

# Alternatively just look for a class which only exists in Java 7 with forName or look for
a public method() or similar. This is easy to detect and catch with ClassNotFoundException/MethodNotFound.
Apache Ant is using this to detect java versions, too. The easiest variant is the all famous
method on Throwable class: 

{code:java}
public static final JRE_IS_MINIMUM_JAVA7;
static {
    // this method only exists in Java 7:
    boolean v7 = true;
    try {
      Throwable.class.getMethod("getSuppressed");
    } catch (Exception e) {
      v7 = false;
    }
    JRE_IS_MINIMUM_JAVA7 = v7;
}
{code}

> org.apache.pdfbox.util.PDFTextStripper class initialization throws NumberFormatException
with recent Verona-enabled Java 9 JVMs
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-3155
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3155
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.8.8, 1.8.10
>            Reporter: Uwe Schindler
>            Priority: Critical
>
> Lucene/Solr runs its whole testsuite also with Java 9 EA releases to trigger bugs early.
In our tests (Solr + TIKA) we found out that org.apache.pdfbox.util.PDFTextStripper throws
a NumberFormatException in its static initializer when parsing the "java.version" system property.
The reason for failure is a change in Java 9, where version numbers got a new format.
> There are 3 problems:
> - It should not assume that all components are really a number. So it should try/catch
NumberFormatException and assign some "unknown" version
> - The code should really use "java.specification.version". This is standardized and only
contains digits.
> - The code should also be prepared to handle version numbers without minor version! E.g.
Java 9 only has "9" instead of "1.9" as its main version number.
> For the use case I would nuke this check and find a better workaround.
> Relying on String parsing for non-standardized system properties in a static class initializer
is the reason why this bug is raised to level "Critical".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Mime
View raw message