Return-Path: Delivered-To: apmail-pdfbox-users-archive@www.apache.org Received: (qmail 75296 invoked from network); 13 Dec 2009 17:46:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 13 Dec 2009 17:46:07 -0000 Received: (qmail 1727 invoked by uid 500); 13 Dec 2009 17:46:07 -0000 Delivered-To: apmail-pdfbox-users-archive@pdfbox.apache.org Received: (qmail 1674 invoked by uid 500); 13 Dec 2009 17:46:06 -0000 Mailing-List: contact users-help@pdfbox.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@pdfbox.apache.org Delivered-To: mailing list users@pdfbox.apache.org Received: (qmail 1664 invoked by uid 99); 13 Dec 2009 17:46:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 13 Dec 2009 17:46:06 +0000 X-ASF-Spam-Status: No, hits=-1.2 required=5.0 tests=BAYES_00,MIME_QP_LONG_LINE X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [134.76.10.23] (HELO tmailer.gwdg.de) (134.76.10.23) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 13 Dec 2009 17:46:03 +0000 Received: from gwdexc-fe3.exc.top.gwdg.de ([134.76.26.173] helo=vsmtpgwdexc.exc.top.gwdg.de) by mailer.gwdg.de with smtp (Exim 4.69) (envelope-from ) id 1NJsVn-0000kf-NE for users@pdfbox.apache.org; Sun, 13 Dec 2009 18:45:39 +0100 Received: from [192.168.1.36] ([77.8.160.145]) by vsmtpgwdexc.exc.top.gwdg.de over TLS secured channel with Microsoft SMTPSVC(6.0.3790.3959); Sun, 13 Dec 2009 18:45:39 +0100 From: Thomas Fischer Mime-Version: 1.0 (Apple Message framework v1077) Content-Type: multipart/signed; boundary=Apple-Mail-2--1048703266; protocol="application/pkcs7-signature"; micalg=sha1 Subject: Re: pdfbox not working for my pdf file. Date: Sun, 13 Dec 2009 18:45:38 +0100 In-Reply-To: <4B23D9E2.3060804@yahoo.com.ar> To: users@pdfbox.apache.org References: <1260626017.59955.ezmlm@pdfbox.apache.org> <4B23D9E2.3060804@yahoo.com.ar> Message-Id: <3DEFAABD-DA22-4EC2-A085-30BE17EF168B@aon.at> X-Mailer: Apple Mail (2.1077) X-OriginalArrivalTime: 13 Dec 2009 17:45:39.0663 (UTC) FILETIME=[15970DF0:01CA7C1C] X-Spam-Level: - X-Virus-Scanned: (clean) by exiscan+sophie --Apple-Mail-2--1048703266 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 Hi, it seems that some PDF files created using TeX will do badly with = PDFBox. I have a version of = http://www.ams.org/era/2003-09-03/S1079-6762-03-00108-2/S1079-6762-03-0010= 8-2.pdf which produces similar results to your file: BXC4BXBVCCCAC7C6C1BV CABXCBBXBTCABVC0 BTC6C6C7CDC6BVBXC5BXC6CCCB... My impression is that this is due to some errors in the chain TeX -> DVI -> ps -> pdf probably due to some earlier versions of dvips (< 5.97?). I can create a readable PDF file from the respective DVI file using = different tools like dvipdfmx. So I am not sure wether this is actually a PDFBox bug or a problem with = invalid PDF files, although JHOVE claims that the file is well-formed = and valid. I suppose that a character or glyph table is not recognised = or found, and Jhove doesn't check "the glyph descriptions of embedded = fonts". Actually, the version of the document mentioned above will crash both = PDFBox 7.3 and 8.0 (Exception in thread "main" java.lang.NoClassDefFoundError: = org/bouncycastle/jce/provider/BouncyCastleProvider), while pdftotext seems to do a reasonable job. On my file mentioned, pdftotext will produce =C4 =CC=CA=C7=C6=C1 =CA =CB =CA =C0 =C6=C6=C7=CD=C6 =C5 =C6=CC=CB =C7 =CC=C0= =C5 =CA=C1 =C6 =C5 =CC=C0 =C5 =CC=C1 =C4 =CB=C7 =C1 =CC =CE=D3=D0=D9=D1 which is no more helpful than the result from PDFBox... I suppose that anybody willing to try may play around with the files = available at http://www.emis.de/journals/ERA-AMS/2003-01-003/2003-01-003.html All the best Thomas Am 12.12.2009 um 18:58 schrieb Ernesto De Santis: > Hi, >=20 > I've a pdf file that pdfbox cant read. Pdfbox read it without errors, > but the output is only in an estrange format, like codes. Always a 'a' > letter and two numbers: a34 a85 a94 a92...... >=20 > I reported it as a bug time ago, without news about it. >=20 > My file was generated with a latex, I use Kile editor in ubuntu OS. > This is the bug issue: >=20 > https://issues.apache.org/jira/browse/PDFBOX-534 >=20 >=20 > Regards, > Ernesto. >=20 >=20 >=20 >=20 > users-digest-help@pdfbox.apache.org escribi=F3: >>=20 >> = ------------------------------------------------------------------------ >>=20 >> Asunto: >> Re: pdfbox not working for my pdf file. >> De: >> Thomas Fischer >> Fecha: >> Thu, 10 Dec 2009 07:57:19 +0100 >> Para: >> users@pdfbox.apache.org >>=20 >> Para: >> users@pdfbox.apache.org >>=20 >>=20 >> Hi, >>=20 >> I've been testing PDFBox on a number of different (mathematical) PDF = files, and my experience shows that PDFBox works in principle on all PDF = files that are not image based, with some specific errors depending on = the the type and creation of the file. If you create a PDF file by = binding images together then this file can't be read by PDFBox. >> An easy test is to try to copy text from your file using any PDF = reader, e.g. Adobe's. If you can copy text, PDFBox should be able to = read it. >>=20 >> Cheers >> Thomas >>=20 >> Am 09.12.2009 um 13:38 schrieb : >>=20 >>=20 >>> Hi, >>> Can pdfbox works for all types of pdf files?? >>> My pdf file size is 6MB.But its not working fine with >>> PDFTextStripper...ie...stripperobject.getText(doc); >>> its not working fine.... >>> please let me know whether this can be used for all types of pdf = files? >>> if not,How one can decide whether a particular file is compatible = with >>> pdfbox?? >>> Thanks in advance....! >>>=20 >>> Please do not print this email unless it is absolutely necessary.=20 >>>=20 >>> The information contained in this electronic message and any = attachments to this message are intended for the exclusive use of the = addressee(s) and may contain proprietary, confidential or privileged = information. If you are not the intended recipient, you should not = disseminate, distribute or copy this e-mail. Please notify the sender = immediately and destroy all copies of this message and any attachments.=20= >>>=20 >>> WARNING: Computer viruses can be transmitted via email. The = recipient should check this email and any attachments for the presence = of viruses. The company accepts no liability for any damage caused by = any virus transmitted by this email.=20 >>>=20 >>> www.wipro.com >>>=20 >>=20 >>=20 >>=20 --Apple-Mail-2--1048703266 Content-Disposition: attachment; filename=smime.p7s Content-Type: application/pkcs7-signature; name=smime.p7s Content-Transfer-Encoding: base64 MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIILDzCCBN0w ggPFoAMCAQICEHGS++YZX6xNEoV0cTSiGKcwDQYJKoZIhvcNAQEFBQAwezELMAkGA1UEBhMCR0Ix GzAZBgNVBAgMEkdyZWF0ZXIgTWFuY2hlc3RlcjEQMA4GA1UEBwwHU2FsZm9yZDEaMBgGA1UECgwR Q29tb2RvIENBIExpbWl0ZWQxITAfBgNVBAMMGEFBQSBDZXJ0aWZpY2F0ZSBTZXJ2aWNlczAeFw0w NDAxMDEwMDAwMDBaFw0yODEyMzEyMzU5NTlaMIGuMQswCQYDVQQGEwJVUzELMAkGA1UECBMCVVQx FzAVBgNVBAcTDlNhbHQgTGFrZSBDaXR5MR4wHAYDVQQKExVUaGUgVVNFUlRSVVNUIE5ldHdvcmsx ITAfBgNVBAsTGGh0dHA6Ly93d3cudXNlcnRydXN0LmNvbTE2MDQGA1UEAxMtVVROLVVTRVJGaXJz dC1DbGllbnQgQXV0aGVudGljYXRpb24gYW5kIEVtYWlsMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8A MIIBCgKCAQEAsjmFpPJ9q0E7YkY3rs3BYHW8OWX5ShpHornMSMxqmNVNNRm5pELlzkniii8efNIx B8dOtINknS4p1aJkxIW9hVE1eaROaJB7HHqkkqgX8pgV8pPMyaQylbsMTzC9mKALi+VuG6JG+ni8 om+rWV6lL8/K2m2qL+usobNqqrcuZzWLeeEeaYji5kbNoKXqvgvOdjp6Dpvq/NonWz1zHyLmSGHG TPNpsaguG7bUMSAsvIKKjqQOpdeJQ/wWWq8dcdcRWdq6hw2v+vPhwvCkxWeM1tZUOt4KpLoDd7Nl yP0e03RiqhjKaJMeoYV+9Udly/hNVyh00jT/MLbu9mIwFIws6wIDAQABo4IBJzCCASMwHwYDVR0j BBgwFoAUoBEKIz6W8Qfs4q8p74Klf9AwpLQwHQYDVR0OBBYEFImCZ33EnSZwAEu0UEh83j2uBG59 MA4GA1UdDwEB/wQEAwIBBjAPBgNVHRMBAf8EBTADAQH/MB0GA1UdJQQWMBQGCCsGAQUFBwMCBggr BgEFBQcDBDARBgNVHSAECjAIMAYGBFUdIAAwewYDVR0fBHQwcjA4oDagNIYyaHR0cDovL2NybC5j b21vZG9jYS5jb20vQUFBQ2VydGlmaWNhdGVTZXJ2aWNlcy5jcmwwNqA0oDKGMGh0dHA6Ly9jcmwu Y29tb2RvLm5ldC9BQUFDZXJ0aWZpY2F0ZVNlcnZpY2VzLmNybDARBglghkgBhvhCAQEEBAMCAQYw DQYJKoZIhvcNAQEFBQADggEBAJ2Vyzy4fqUJxB6/C8LHdo45PJTGEKpPDMngq4RdiVTgZTvzbRx8 NywlVF+WIfw3hJGdFdwUT4HPVB1rbEVgxy35l1FM+WbKPKCCjKbI8OLp1Er57D9Wyd12jMOCAU9s APMeGmF0BEcDqcZAV5G8ZSLFJ2dPV9tkWtmNH7qGL/QGrpxp7en0zykX2OBKnxogL5dMUbtGB8SK N04g4wkxaMeexIud6H4RvDJoEJYRmETYKlFgTYjrdDrfQwYyyDlWjDoRUtNBpEMD9O3vMyfbOeAU TibJ2PU54om4k123KSZB6rObroP8d3XK6Mq1/uJlSmM+RMTQw16Hc6mYHK9/FX8wggYqMIIFEqAD AgECAhAIiNRH9wp9WohwqFKX5PTWMA0GCSqGSIb3DQEBBQUAMIGuMQswCQYDVQQGEwJVUzELMAkG A1UECBMCVVQxFzAVBgNVBAcTDlNhbHQgTGFrZSBDaXR5MR4wHAYDVQQKExVUaGUgVVNFUlRSVVNU IE5ldHdvcmsxITAfBgNVBAsTGGh0dHA6Ly93d3cudXNlcnRydXN0LmNvbTE2MDQGA1UEAxMtVVRO LVVTRVJGaXJzdC1DbGllbnQgQXV0aGVudGljYXRpb24gYW5kIEVtYWlsMB4XDTA5MTEyMTAwMDAw MFoXDTEwMTEyMTIzNTk1OVowgdsxNTAzBgNVBAsTLENvbW9kbyBUcnVzdCBOZXR3b3JrIC0gUEVS U09OQSBOT1QgVkFMSURBVEVEMUYwRAYDVQQLEz1UZXJtcyBhbmQgQ29uZGl0aW9ucyBvZiB1c2U6 IGh0dHA6Ly93d3cuY29tb2RvLm5ldC9yZXBvc2l0b3J5MR8wHQYDVQQLExYoYykyMDAzIENvbW9k byBMaW1pdGVkMRcwFQYDVQQDEw5UaG9tYXMgRmlzY2hlcjEgMB4GCSqGSIb3DQEJARYRZmlzY2hl ci50aEBhb24uYXQwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQDHigWtICJpxEcnZE28 X9ggivQzkKCnfjyUu0u8Ki8vbqldcRZfGGHUwYBFflW4OOIcbfrdhCd2c3s48CmvA1J9S03jZkxt uh29Z4vhR9J1CSPw2ONBQ4DmMRgEXsDvPvpm2LeYjplSuMFYySt1ZcjlfZFLjN7ZUbGgnBESb4CM PX9ANc9ZN9BiPBk7Mz4hRs6SFscC8VvLU5LNgMN10clugMx1LA7on1IxMzWZk5Ye8xHt+acs2KMA Gi/4fNA3hdgzs96DJ+l5QfNhvl3BKMotMwwNYBkupZpsQHisaMCeKbGUMxpDMuv73XiWFeatvzJH x9K1DyHurnYS5i8yWLZBAgMBAAGjggITMIICDzAfBgNVHSMEGDAWgBSJgmd9xJ0mcABLtFBIfN49 rgRufTAdBgNVHQ4EFgQU3ZXlpXM4tq7m0ANF3hH/oJzGVEUwDgYDVR0PAQH/BAQDAgWgMAwGA1Ud EwEB/wQCMAAwIAYDVR0lBBkwFwYIKwYBBQUHAwQGCysGAQQBsjEBAwUCMBEGCWCGSAGG+EIBAQQE AwIFIDBGBgNVHSAEPzA9MDsGDCsGAQQBsjEBAgEBATArMCkGCCsGAQUFBwIBFh1odHRwczovL3Nl Y3VyZS5jb21vZG8ubmV0L0NQUzCBpQYDVR0fBIGdMIGaMEygSqBIhkZodHRwOi8vY3JsLmNvbW9k b2NhLmNvbS9VVE4tVVNFUkZpcnN0LUNsaWVudEF1dGhlbnRpY2F0aW9uYW5kRW1haWwuY3JsMEqg SKBGhkRodHRwOi8vY3JsLmNvbW9kby5uZXQvVVROLVVTRVJGaXJzdC1DbGllbnRBdXRoZW50aWNh dGlvbmFuZEVtYWlsLmNybDBsBggrBgEFBQcBAQRgMF4wNgYIKwYBBQUHMAKGKmh0dHA6Ly9jcnQu Y29tb2RvY2EuY29tL1VUTkFBQUNsaWVudENBLmNydDAkBggrBgEFBQcwAYYYaHR0cDovL29jc3Au Y29tb2RvY2EuY29tMBwGA1UdEQQVMBOBEWZpc2NoZXIudGhAYW9uLmF0MA0GCSqGSIb3DQEBBQUA A4IBAQBOZo1sJULDNhvp6XZEwbEyN9/BtsTA+KLq20W/2goR3Q0dnKjgv7V5TU2CyQQGz7KFUsp8 2TOO44Ism6q57zGCNPPnOcqAbYTOExTMiAEtmsV3fnFJNwmzkFPjjnu8cndQ8bVxJxqHA/emNGTT eK09wqhnHYyJwQkBj+2X2g1eVCQf5Ujal1+RNEOqp20scR/bqjzHtmPKkEpCthUDnDZdWOKDDj9D 32oZdIqXF8Sx8W/kpwiQkcO02Q6z55Iu2azYPTkZlf72bLH28tBEW2bXerQc/ZiZgPbNUEkrbZje CBSZPwMKdZac6YzWtNtUCxjDJ8QzrILh9nvQpSrz3jZoMYID/DCCA/gCAQEwgcMwga4xCzAJBgNV BAYTAlVTMQswCQYDVQQIEwJVVDEXMBUGA1UEBxMOU2FsdCBMYWtlIENpdHkxHjAcBgNVBAoTFVRo ZSBVU0VSVFJVU1QgTmV0d29yazEhMB8GA1UECxMYaHR0cDovL3d3dy51c2VydHJ1c3QuY29tMTYw NAYDVQQDEy1VVE4tVVNFUkZpcnN0LUNsaWVudCBBdXRoZW50aWNhdGlvbiBhbmQgRW1haWwCEAiI 1Ef3Cn1aiHCoUpfk9NYwCQYFKw4DAhoFAKCCAg0wGAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAc BgkqhkiG9w0BCQUxDxcNMDkxMjEzMTc0NTM4WjAjBgkqhkiG9w0BCQQxFgQUK9ihFvoOAO8I4lcY 9hfD5Vek3dwwgdQGCSsGAQQBgjcQBDGBxjCBwzCBrjELMAkGA1UEBhMCVVMxCzAJBgNVBAgTAlVU MRcwFQYDVQQHEw5TYWx0IExha2UgQ2l0eTEeMBwGA1UEChMVVGhlIFVTRVJUUlVTVCBOZXR3b3Jr MSEwHwYDVQQLExhodHRwOi8vd3d3LnVzZXJ0cnVzdC5jb20xNjA0BgNVBAMTLVVUTi1VU0VSRmly c3QtQ2xpZW50IEF1dGhlbnRpY2F0aW9uIGFuZCBFbWFpbAIQCIjUR/cKfVqIcKhSl+T01jCB1gYL KoZIhvcNAQkQAgsxgcaggcMwga4xCzAJBgNVBAYTAlVTMQswCQYDVQQIEwJVVDEXMBUGA1UEBxMO U2FsdCBMYWtlIENpdHkxHjAcBgNVBAoTFVRoZSBVU0VSVFJVU1QgTmV0d29yazEhMB8GA1UECxMY aHR0cDovL3d3dy51c2VydHJ1c3QuY29tMTYwNAYDVQQDEy1VVE4tVVNFUkZpcnN0LUNsaWVudCBB dXRoZW50aWNhdGlvbiBhbmQgRW1haWwCEAiI1Ef3Cn1aiHCoUpfk9NYwDQYJKoZIhvcNAQEBBQAE ggEArvKd2pG+/tO40qXX1fbgson+wG6YPeTV6of8G5YDX7MWnMn6yySyahAJjfOKuJAc0QegFihR vgGrNPtfWj/Ky93tZ0b/Ng/S83KrnqTXOuzrgM1xCfri2C/YkmBEpHa8j0BqUowUsUEaSeCz3rQ5 cqcqDcsYLAJ7RbxQS7O3pqJR7r+XKZlJchxQ90iBTHmxyodGnZ3uS79MVIgV6qscZwWsPbBbFLWj V0+6kgUnNhZsej7ntEfycnHi4S00/K5h2u8d+CKm40s8OMsPOMJlipWhbG2YBK1x0/wQVR/Yc1s1 8YDwo8P9zfvHsrtocdVW+tGfl/kGJH9/LX/osBN9bwAAAAAAAA== --Apple-Mail-2--1048703266--