pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Smith (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PDFBOX-1067) PDF Scan from Xerox WorkCentre 5030 renders as all black
Date Sat, 10 Mar 2012 14:32:57 GMT

    [ https://issues.apache.org/jira/browse/PDFBOX-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226867#comment-13226867
] 

Dave Smith commented on PDFBOX-1067:
------------------------------------

If you read the PDF spec (3.3.6 JBIG2Decode Filter) it explicitly says to strip the J2BIG
header off and split the 0 segment into JBIG2Globals part and then drop the end of page and
end of file segments. So ImageIO will never find the right type of reader because of the missing
header. So we have to load the reader manually and insert the globals segment before the rest
of the stream data. Here is a proof of concept that you can try with http://code.google.com/p/jbig2-imageio

in org.apache.pdfbox.filter.JBIG2Filter

new decode ...

@Override
	public void decode( InputStream compressedData, OutputStream result, COSDictionary options,
int filterIndex )
        throws IOException
    {
    	Iterator<ImageReader> readers = ImageIO.getImageReadersByFormatName("JBIG2");
        if (!readers.hasNext())
        {
            log.error( "Can't find an ImageIO plugin to decode the JBIG2 encoded datastream.");
            return;
        }
        
        ImageReader reader = readers.next();
        
    	COSDictionary decodeP = (COSDictionary) options.getDictionaryObject(COSName.DECODE_PARMS);
        COSStream st = (COSStream) decodeP.getDictionaryObject(COSName.getPDFName("JBIG2Globals"));
        reader.setInput(ImageIO.createImageInputStream(JBIG2StreamMerge(st.getFilteredStream(),compressedData)));

        BufferedImage bi = reader.read(0);
        if ( bi != null )
        {
            DataBuffer dBuf = bi.getData().getDataBuffer();
            if ( dBuf.getDataType() == DataBuffer.TYPE_BYTE )
            {
                result.write( ( ( DataBufferByte ) dBuf ).getData() );
            }
            else
            {
                log.error( "Image data buffer not of type byte but type " + dBuf.getDataType()
);
            }
        }
        else
        {
           log.error( "Something went wrong when decoding the JBIG2 encoded datastream.");
        }
    }

// ugly. Should use some sort of stream merge ...
 protected static InputStream JBIG2StreamMerge(InputStream globals,InputStream body)
    	throws IOException
    {
    		ByteArrayOutputStream out = new ByteArrayOutputStream();
    		byte buf[] = new byte[1024];
    		int read = globals.read(buf);
    		while(read != -1)
    		{
    			out.write(buf, 0, read);
    			read = globals.read(buf);
    		}
    		read = body.read(buf);
    		while(read != -1)
    		{
    			out.write(buf, 0, read);
    			read = body.read(buf);
    		}
    		out.close();
    		return new ByteArrayInputStream(out.toByteArray());
 
    }

                
> PDF Scan from Xerox WorkCentre 5030 renders as all black
> --------------------------------------------------------
>
>                 Key: PDFBOX-1067
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1067
>             Project: PDFBox
>          Issue Type: New Feature
>          Components: PDModel
>    Affects Versions: 1.6.0
>         Environment: Tested on MacOS X 10.6.7, Ubuntu 10.10, Windows 7
>            Reporter: Sarah Kelley
>         Attachments: ItDoesntWorkScan.pdf, sakelley_pdf_rendering_problem.zip
>
>
>     The file "ItDoesntWorkScan.pdf" renders to an empty
>     black page. This file is a copy of "ItDoesntWorkPrinted.pdf"
>     that has been printed on paper, and then scanned with
>     a Xerox WorkCentre 5030 scanner, which then emails a pdf file
>     back to the user.
>     Tested On:
>         - Mac OS 10.6
>         - Windows 7
>         - Ubuntu 10.10
>     Unfortunately, the WorkCentre 5030 doesn't appear to have
>     many user-settable options for scanning to PDF, so we weren't
>     really able to try scanning with settings other than the defaults.
> Will attach pdf and code to demonstrate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message