pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkuehler <andr...@lehmi.de>
Subject Re: Extract Embedded files from pdf using pdfbox in .NET application
Date Sat, 22 Jun 2013 07:07:06 GMT
Hi,


Am 20.06.2013 12:48, schrieb Ramesh Shrestha:
> Thanks,
>
> As per your suggestion using annotation I was able to extract the name of
> the embedded file however the contents of that file could not be extracted
> Please refer to the code below.
>
> var originalDocument = PDDocument.load(_PdfFile);
>
> var originalCatalog = originalDocument.getDocumentCatalog();
>
> java.util.List sourceDocumentPages = originalCatalog.getAllPages();
>
> var newDocument = new PDDocument();
>
> //number of pages in pdf file = 2
>
> int[] PageNumbers = { 1, 2 };
>
>
>
> foreach (var pageNumber in PageNumbers)
>
> {
>
> // Page numbers are 1-based, but PDPages are contained in a zero-based
> array:
>
> int pageIndex = pageNumber - 1;
>
> PDPage pdpage = new PDPage();
>
> try
>
> {
>
> pdpage = (PDPage)sourceDocumentPages.get(pageIndex);
>
> List anno =  pdpage.getAnnotations();
>
> If(anno.size() > 0)
>
> {
>
> PDAnnotationFileAttachment pafa = (PDAnnotationFileAttachment)anno.get(0);
>
> //FILENAME = GETCONTENTS()
>
> string filename = pafa.getContents();
>
> PDFileSpecification fs = pafa.getFile();
>
>                }
>
>         }
>
> catch (Exception)
>
>         { }
>
> }
> Can you help me one more time to extract and dump the embedded file in the
> specified location?

You already mentioned some sample code yourself. [1] demonstrates how to do that.

[1] 
http://svn.apache.org/repos/asf/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/pdmodel/ExtractEmbeddedFiles.java

> On Thu, Jun 20, 2013 at 2:46 PM, Ramesh Shrestha <rameshpasa@gmail.com>wrote:
>
>>
>> Even after trying Annotation i am not able to extract the
>> embedded/attached doc file located in the page of pdf.
>>
>> On Tue, Jun 11, 2013 at 5:29 PM, Andreas Lehmkuehler <andreas@lehmi.de>wrote:
>>
>>> Am 11.06.2013 07:06, schrieb Ramesh Shrestha:
>>>
>>>> Thanks,
>>>>
>>>> The java example link i provided should have been -
>>>>
>>>> http://svn.apache.org/repos/asf/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/pdmodel/ExtractEmbeddedFiles.java
>>>>
>>>> But your suggestion WORKS.
>>>>
>>>> Now i am able to extract the attached file located in the *attachments
>>>> tab*but
>>>> *haven't been able to extract the attached file located in page*. I am
>>>>
>>>> getting null efTree in this case.
>>>>
>>>>           PDDocumentNameDictionary namesDictionary = new
>>>> PDDocumentNameDictionary(pdfDoc.getDocumentCatalog());
>>>>           PDEmbeddedFilesNameTreeNode *efTree *=
>>>>
>>>> namesDictionary.getEmbeddedFiles();
>>>>
>>>> So now working on it.
>>>>
>>> Embedded files are always document related. If an embedded file is
>>> referenced
>>> on a single page a file attachment annotation is used. Try something like
>>> this
>>> to get all annotations of a single page:
>>>
>>> List annotations = page.getAnnotations();
>>>
>>> The one you are looking for has to be an instance of the class
>>>
>>>
>>> org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationFileAttachment.
>>>
>>>   On Mon, Jun 10, 2013 at 7:38 PM, Andreas Lehmkuehler <andreas@lehmi.de
>>>>> wrote:
>>>>
>>>>   Hi,
>>>>>
>>>>> Am 10.06.2013 11:22, schrieb Ramesh Shrestha:
>>>>>
>>>>>    Hi,
>>>>>
>>>>>>
>>>>>>
>>>>>>      I am developing .NET Application using pdfbox to extract metadata,
>>>>>> content and attached file from PDF.
>>>>>>
>>>>>> I was able to extract metadata and content, but stuck while extracting
>>>>>> attached/embedded files.
>>>>>>
>>>>>> I have a pdf with embedded/attached doc file and want to retrieve
that
>>>>>> file. I have gone through the java example -
>>>>>>
>>>>>> http://www.docjar.com/html/**api/org/apache/pdfbox/**examples/pdmodel/**
>>>>>> EmbeddedFiles.java.html<
>>>>>> http://www.docjar.com/html/api/org/apache/pdfbox/examples/pdmodel/EmbeddedFiles.java.html
>>>>>>>
>>>>>>
>>>>>> .
>>>>>>
>>>>>> But while trying to use it in .Net, i got "non generic type
>>>>>> 'java.util.Map'
>>>>>> cannot be used with type arguments" in the following code snippet
>>>>>>
>>>>>> java.util.Map<String, COSObjectable> names = efTree.getNames();
>>>>>>
>>>>>> So, i will be grateful if anybody help me to extract the file from
pdf.
>>>>>>
>>>>>>   I'm not a .NET expert and don't know what may cause that issue.
But
>>>>> maybe
>>>>> it is
>>>>> a good idea to just omit the generics and try something like this:
>>>>>
>>>>> java.util.Map names = efTree.getNames();
>>>>>
>>>>>    Thanks in advance.
>>>>>
>>>>>>
>>>>>>
>>>>> HTH
>>>>> Andreas Lehmkühler
>>>>>
>>>>
>>> BR
>>> Andreas Lehmkühler

BR
Andreas Lehmkühler


Mime
View raw message