pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tilman Hausherr (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PDFBOX-4514) inefficient use of synchronized in PDICCBased.java
Date Tue, 16 Apr 2019 17:13:00 GMT

     [ https://issues.apache.org/jira/browse/PDFBOX-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tilman Hausherr updated PDFBOX-4514:
------------------------------------
    Component/s: Rendering

> inefficient use of synchronized in PDICCBased.java
> --------------------------------------------------
>
>                 Key: PDFBOX-4514
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4514
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Rendering
>    Affects Versions: 2.0.15
>            Reporter: Jason
>            Priority: Minor
>
> PDICCBased.java uses synchronized with static variable, e.g. synchronized (LOG) . It
doesn't look to me it really needs to do it this way. This is very inefficient when multiple
threads process different PDF at the same time. Change it to synchronized (this) will improve
the performance.
> [https://github.com/apache/pdfbox/blob/3b16f3b4f42c61dd5fe990c586f60465f83a8ef8/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/color/PDICCBased.java#L191]
> Sample code simulates multiple threads process different PDF at the same time:
>  
> {code:java}
> public static void main(String[] args) throws IOException {
>   for (int i = 0; i < 10; i++) { // just run multiple time
>     doWork();
>   }
> }
> private static void doWork() throws IOException {
>   long startTime = System.currentTimeMillis();
>   String pdfFilename = "<absolute path to your pdf file>"; // replace this with
your test file
>   System.setProperty("sun.java2d.cmm", "sun.java2d.cmm.kcms.KcmsServiceProvider");
>   PDDocument document = PDDocument.load(new File(pdfFilename));
>   List<PDDocument> pdfPages = new Splitter().split(document);
>   Map<Integer, PDDocument> pdfPagesWithIndex = new HashMap<>();
>   for (int i = 0; i < pdfPages.size(); i++) {
>     pdfPagesWithIndex.put(i, pdfPages.get(i));
>   }
>   // multiple threads running in parallel
>   pdfPagesWithIndex.entrySet().parallelStream().forEach(entry -> {
>     try {
>       processPDF(entry.getKey(), entry.getValue());
>     } catch (Exception e) {
>       System.out.println(e);
>     }
>   });
>   System.out.println("Convertion time: " + (System.currentTimeMillis() - startTime));
>   try {
>     document.close();
>   } catch (IOException ignored) {
>   }
> }
> private static void processPDF(int index, PDDocument pdfPage) throws IOException {
>   PDFRenderer renderer = new PDFRenderer(pdfPage);
>   try {
>     renderer.renderImageWithDPI(0, 180, ImageType.RGB);
>   } catch (IOException e) {
>     System.out.println(e);
>   }
>   try {
>     pdfPage.close();
>   } catch (IOException ignored) {
>   }
> }
> {code}
> I observed by changing synchronized (LOG) to synchronized (this), the above code can
have maybe 20-30% reduction in latency. If I do a thread dump, I can see many threads are
blocked on synchronized (LOG). 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org


Mime
View raw message