commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gary Lucas (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SANSELAN-56) proposed enhancement reduces load time for some image files by 40 percent
Date Tue, 18 Oct 2011 14:04:11 GMT

    [ https://issues.apache.org/jira/browse/SANSELAN-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129747#comment-13129747
] 

Gary Lucas commented on SANSELAN-56:
------------------------------------

Since I am claiming some pretty large performance gains with this change
(and also advocating it as a possible solution for other data formats
such as PNG and JPEG) I thought it appropriate to describe how I got
the numbers I posted.

I test mostly under Windows with some Linux.  Since a modern OS is a 
noisy test environment, I try to run these tests when there are as
few additional things as possible running on the system.

My test methods always repeat the test procedure in a loop
that performs the same operation multiple times.  I disregard
the timing results from the first few iterations of the loop
because (a) the initial operations reflect the timing overhead
for class loaders and the JIT compiler and (b) because
the Windows file cache pretty much guarantees that the
second time a program loads a file takes a lot less time
than the first.

The Sanselan.getBufferedImage(File) method has logic to
determine which of its "image parsers" to use based on the
file extension and other criteria.  To eliminate this process
from the time measurements, I always create an instance
of the image parser specifically.

Between iterations, I also explicitly put any objects
created by the test program out of scope and
run the garbage collector.  This approach is intended to
reduce the probability of significant garbage collection
running during the main operation that I am trying to time.

I have been testing with larger files (multiple megapixels)
so the resolution and noisiness of the system clock does
not affect the timing.

Here's a snippet of code I use for testing

 public void testImageLoadTime(File file) 
            throws ImageReadException, IOException
    {
        long nPixel = 0;
        long sumTimeMS = 0;
        int nTests = 0;
        for (int i = 0; i < 5; i++) {

            // load the image
            // Sanselan has logic to pick the right parser for the 
            // image format based on the file extension. But, to isolate
            // performace costs to specific functionality, we bypass
            // all of that and create an instance of a specific parser
            HashMap params = new HashMap();
            TiffImageParser tiffImageParser = new TiffImageParser();
            ByteSourceFile byteSource = new ByteSourceFile(file);
            // record the start time
            long time0 = System.nanoTime();
            BufferedImage bImage = 
                    tiffImageParser.getBufferedImage(byteSource, params);
            // record the completion time
            long time1 = System.nanoTime();
            // compute difference and print elapsed time
            // for accumulated statistics, ignore the first two trials
            long deltaMS = (time1 - time0) / 1000000;
            if (i > 2) {
                nTests++;
                sumTimeMS += deltaMS;
            }
            nPixel = bImage.getWidth() * bImage.getHeight();
            System.out.println("time (ms) =" + deltaMS);
            // put all relevant objects out-of-scope and
            // run garbage collector
            bImage = null;
            tiffImageParser = null;
            byteSource = null;
            Runtime.getRuntime().gc();
        }
        System.out.println("Number of pixels       " + nPixel);
        System.out.println("Average load time (ms) "
                + ((double) sumTimeMS / nTests));
    }


                
> proposed enhancement reduces load time for some image files by 40 percent
> -------------------------------------------------------------------------
>
>                 Key: SANSELAN-56
>                 URL: https://issues.apache.org/jira/browse/SANSELAN-56
>             Project: Commons Sanselan
>          Issue Type: Improvement
>         Environment: Tested in Windows, Linux, MacOS
>            Reporter: Gary Lucas
>              Labels: api-change
>         Attachments: Sanselan-56-SpeedEnhanceTiff.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I have identified an enhancement that reduces the time required to load TIFF image by
40 percent.  I have tested a modified version of Sanselan under Windows, Linux, and MacOS
with consistent savings on each platform.  Additionally, I suspect that this technique may
be applicable to other areas of the Sanselan code base, including more popular image formats
supported by Sanselan such as JPEG, PNG, etc.
> I propose to add the relevant code changes to the Sanselan code base.  Once these modifications
are in place, there would be an opportunity for others to look at the pro's and cons' of applying
the techniques to other data formats.
> The Enhancement
> To load an image from a TIFF file, Sanselan performs extensive data processing in order
to obtain RGB values for the pixels in the output image. The code for that processing appears
to be well written and efficient. Once the RGB value are obtained, they are stored in a Java
BufferedImage using a call  to the setRGB() method.
> Unfortunately, setRGB() is an extremely inefficient method.   A much, much better approach
is to store the data into an integer array and defer the creation of the buffered image until
all information for the image has been collected.    Java has a nice (though somewhat obscure)
API that lets memory in an integer array be transferred directly to a BufferedImage so that
the system does not have to allocate additional memory for this procedure (a very nice feature
when dealing with huge images).  This change virtually eliminated the overhead for transferring
data to images, which accounted for 40 percent of the time required to load images.  For TIFF
files, this was a reasonable approach because the TiffImageParser class always loads 4-byte
image  and the getGrayscaleBufferedImage() method is never used.  I have not investigated
the code for the other renders, but some refinement might be needed for the one-byte grayscale
images.
> Steps to Integration
> In sanselan.common, a new class called ImagePrep was created.  ImagePrep carries a width,
height, and an integer array for storing pixels.  It provides its own setRGB() method which
looks just like the one in BufferedImage.   Finally, it provides a method called getBufferedImage()
which creates a BufferedImage from its internal the integer array when the processing is complete.
> In the TiffImageParser classes, data is read from input stream and transferred to pixel
values in a series of classes known as PhotometricInterpreters.  These were modified to operate
on ImagePrep objects rather than BufferedImage objects.  The DataReader and TiffImageParser
classes were modified to pass ImagePrep objects into the photometric interpreters rather than
using BufferedImages.
> At the very last step, before passing its result back to the calling method (the Sanselan
main class, etc.), the TiffImageParser used the ImagePrep.getBufferedImage() to convert the
result to the expected form.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message