commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gary Lucas (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (SANSELAN-76) Reduce memory use of TIFF readers
Date Sun, 06 May 2012 01:59:47 GMT

    [ https://issues.apache.org/jira/browse/SANSELAN-76?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269102#comment-13269102
] 

Gary Lucas edited comment on SANSELAN-76 at 5/6/12 1:59 AM:
------------------------------------------------------------

I prepared a patch showing the changes that produced the reduction in memory use I included
in my earlier comments.   The changes involve two classes, DataReaderStrips.java and DataReaderTiles.java,
that I had previously modified for the still pending patches I submitted for tracker item
58.   In order to keep the work separate, I backed out the changes from item 58 and made sure
I worked on pristine versions of the classes from the Apache Imaging development trunk.  

The down side to doing that is that now the two tracker items represent parallel versions
of the code.  I am highly motivated to get these changes into the code base because they permit
me to access large TIFF files that were previously unreadable for my application due to memory
use.  So let me know if you need me do prepare additional patches for submission.

  
                
      was (Author: gwlucas):
    Patch showing changes
                  
> Reduce memory use of TIFF readers
> ---------------------------------
>
>                 Key: SANSELAN-76
>                 URL: https://issues.apache.org/jira/browse/SANSELAN-76
>             Project: Commons Sanselan
>          Issue Type: Improvement
>          Components: Format: TIFF
>            Reporter: Gary Lucas
>         Attachments: Tracker_76_Test_5_May_2012.patch
>
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> This Tracker Item proposes changes to the TIFF file readers to address memory issues
when reading very large images from TIFF files.  The TIFF format is used extensively in technical
applications such as aerial photographs, satellite images, and digital raster maps which feature
very large image sizes.  For example, the public-domain Natural Earth Data set features raster
files sized 21,600 by 10,800 pixels (222.5 megapixels).   Although this example is unusually
large, image sizes of 25 to 100 megapixels are common for such applications.
> Unfortunately, when Sanselan reads a TIFF image, it consumes nearly twice as much memory
as is necessary.  The reader operates in two stages. First, it reads the entire source file
into memory then it builds the output image, also in memory.   In the example file mentioned
above, the source data runs from 83.19 to 373 megabytes (depending on compression).   Thus
Sanselan would require a minimum of 83.19+4*222.5 = 985 megabytes to produce an image for
one of these files (allowing 4 bytes per pixel in the output BufferedImage)
> Fortunately, TIFF files are organized so that they can be read a piece at a time.  TIFF
files are divided into either strips or tiles and, if data compression is used, each piece
is compressed individually.  Thus each individual piece has no dependency on the other. 
> This item proposes to implement two changes:
> 1)  Allow the TIFF data reader to read the files one piece at a time while constructing
the buffered image.  Thus the memory use for reading would be no larger than the piece size.
 This would be an internal change, so the external appearance of the Sanselan getBufferedImage
methods would not change.
> 2) Provide new API elements that permit applications to read the strips or tiles from
TIFF files individually.     This change would support applications that needed to access
very large TIFF files without committing the memory to store a BufferedImage for the entire
file (a 222.5 megapixel image requires 890 megabytes, which is a lot even by contemporary
standards).
> There is one minor issue in this implementation that is easily addressed.  Sanselan reads
images from ByteSources that can be either random-access files or sequential-access input
streams.  In the case of sequential-input streams, it may be hard to perform a partial read
on a TIFF directory.  In such a case, the TIFF access routines might have to resort to reading
the entire source data into memory as it currently does.   This would simply be a limitation
of the implementation.
> There is one issue that may make this change a bit problematic.  The TIFF processors
depend on accessing a class called TiffDataElement that contains a public array of bytes called
"data".   The most expeditious way of implementing the enchancement is to make this element
private and add an accessor that either returns the data from internal memory or else loads
it on-demand.  Unfortunately, because the data element is scoped to public, there is a chance
that some existing applications are using it directly.   In hindsight, it is clear that scoping
this element as public was a mistake, but it may be too late to fix it.  So care will be required
to ensure that compatibility remains.   The most likely solution seems to be to implement
a new class for passing raw data from the source TIFF files to the DataReader implementations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message