lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sergiu gordea <gser...@ifit.uni-klu.ac.at>
Subject Re: Need advice: what pdf lib to use?
Date Mon, 25 Oct 2004 09:20:14 GMT
iouli.golovatyi@group.novartis.com wrote:

 Hi Iouli,

  If you don't think is illegal, you can hack the pdfbox code to remove 
the protection ...

    Sergiu

>PDFbox stumbles also with "class java.io.IOException with message:  - You 
>do not have permission to extract text" in case the doc is copy/print 
>protected.
>I tested now the snowtide commercial product and it looks like it could 
>process these files as well. Performance was also not so bad. Unfortunatly 
>the test result could not be considered as 100%, because the free version 
>processed just first  8  pages.  After all this product costs a fortune 
>(as long the company is ready to pay I don't realy mind:))
>
>J.
>
>
>
>
>
>Robert Newson <rnewson@connected.com>
>Sent by: news <news@sea.gmane.org>
>24.10.2004 17:44
>Please respond to "Lucene Users List"
>
> 
>        To:     lucene-user@jakarta.apache.org
>        cc:     (bcc: Iouli Golovatyi/X/GP/Novartis)
>        Subject:        Re: Need advice: what pdf lib to use?
>        Category: 
>
>
>
>iouli.golovatyi@group.novartis.com wrote:
>  
>
>>Hello all,
>>
>>I need a piece of advice/experience..
>>
>>What pdf parser (written in java) u'd recommend?
>>
>>I played now with PDFBox-0.6.7a and would not say I was satisfied too 
>>    
>>
>much 
>  
>
>>with it
>>
>>On certain pdf's (not well formated but anyway readable with acrobate) 
>>    
>>
>it 
>  
>
>>run into dead loop (this I could fix in code),
>>and on one file it produced "out of memory error" and killed jvm:( (this 
>>    
>>
>
>  
>
>>problem I could not identify yet)
>>
>>After all the performance was not too great as well: it took c. 19 h. to 
>>    
>>
>
>  
>
>>index 13000 files (c. 3.5Gb)
>>
>>Regards,
>>J.
>>
>>
>>
>>    
>>
>
>On the specific problem of the "dead loop", I reported an instance of 
>this to Ben a week or so ago and he has fixed it in the latest 
>nightlies.  I expect an official release will include this bugfix soon. 
>The file in question was unreadable with any PDF software I have, but 
>someone managed to create it somehow...
>
>http://sourceforge.net/tracker/index.php?func=detail&aid=1037145&group_id=78314&atid=552832
>
>I've found pdfbox to be pretty good. The only time I get problems is 
>with corrupted or egregiously bad PDF files.
>
>B.
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message