lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sergiu gordea <gser...@ifit.uni-klu.ac.at>
Subject Re: Need advice: what pdf lib to use?
Date Mon, 25 Oct 2004 16:07:52 GMT
Ben Litchfield wrote:

>In order to write software that consumes PDF documents you must agree to a
>list of conditions.  One of those conditions is that permissions specified
>by the author of the PDF document are respected.
>
>PDFBox complies with this statement, if there is software that does not
>then they are in violation of copyright law.
>
>  
>
I wanted to say something like this in one of my previous emails, when I 
said that  anyone can modify the code of
PDFBox to replace the restrictions

>That being said, PDFBox is open source so a user could make modifications
>to the source code, or as a PDF library could change permissions on a
>document.
>  
>
This seems to me as beeing a business decision,

 Iouli .... if your boss tels you that PDFBox is useless because it 
prevents you to get the text from protected pdfs,
than you should say him ... I can fix it but it is not legal. You can 
hack PDFbox, but before doing this you should
ensure that the authors let you do it.

 All the best,

  Sergiu
 

>Ben
>
>On Mon, 25 Oct 2004 iouli.golovatyi@group.novartis.com wrote:
>
>  
>
>>Yes Ben, You are right.
>>
>>This would be correct functionality from technical perspective. But look
>>it my way with application programmer eyes reporting to big boss that c.
>>30% of doc we cope with could not be indexed because of this stupid
>>limitation. Neither he or me have any influence on pdf owners and any
>>ideas about what made  them create files with documet security set.
>>
>>In short, if You also could implement this "uncorrect functionality"  the
>>"closed source" guys did, it would be really great!
>>
>>As far as sponsoring is concerned I would be ready to hack (or at least to
>>try) it even for 1/3 of that fortune:)))
>>
>>J.
>>
>>
>>
>>
>>
>>Ben Litchfield <ben@csh.rit.edu>
>>25.10.2004 14:02
>>Please respond to "Lucene Users List"
>>
>>
>>        To:     Lucene Users List <lucene-user@jakarta.apache.org>
>>        cc:     (bcc: Iouli Golovatyi/X/GP/Novartis)
>>        Subject:        Re: Need advice: what pdf lib to use?
>>        Category:
>>
>>
>>
>>
>>PDFBox does not 'stumble' when it gives that message, that is correct
>>functionality if that permission is not allowed.
>>
>>If your company is willing to pay a 'fortune' why not sponsor a change to
>>an open source project for half a fortune.
>>
>>Ben
>>http://www.pdfbox.org
>>
>>On Mon, 25 Oct 2004 iouli.golovatyi@group.novartis.com wrote:
>>
>>    
>>
>>>PDFbox stumbles also with "class java.io.IOException with message:  -
>>>      
>>>
>>You
>>    
>>
>>>do not have permission to extract text" in case the doc is copy/print
>>>protected.
>>>I tested now the snowtide commercial product and it looks like it could
>>>process these files as well. Performance was also not so bad.
>>>      
>>>
>>Unfortunatly
>>    
>>
>>>the test result could not be considered as 100%, because the free
>>>      
>>>
>>version
>>    
>>
>>>processed just first  8  pages.  After all this product costs a fortune
>>>(as long the company is ready to pay I don't realy mind:))
>>>
>>>J.
>>>
>>>
>>>
>>>
>>>
>>>Robert Newson <rnewson@connected.com>
>>>Sent by: news <news@sea.gmane.org>
>>>24.10.2004 17:44
>>>Please respond to "Lucene Users List"
>>>
>>>
>>>        To:     lucene-user@jakarta.apache.org
>>>        cc:     (bcc: Iouli Golovatyi/X/GP/Novartis)
>>>        Subject:        Re: Need advice: what pdf lib to use?
>>>        Category:
>>>
>>>
>>>
>>>iouli.golovatyi@group.novartis.com wrote:
>>>      
>>>
>>>>Hello all,
>>>>
>>>>I need a piece of advice/experience..
>>>>
>>>>What pdf parser (written in java) u'd recommend?
>>>>
>>>>I played now with PDFBox-0.6.7a and would not say I was satisfied too
>>>>        
>>>>
>>>much
>>>      
>>>
>>>>with it
>>>>
>>>>On certain pdf's (not well formated but anyway readable with acrobate)
>>>>        
>>>>
>>>it
>>>      
>>>
>>>>run into dead loop (this I could fix in code),
>>>>and on one file it produced "out of memory error" and killed jvm:(
>>>>        
>>>>
>>(this
>>    
>>
>>>>problem I could not identify yet)
>>>>
>>>>After all the performance was not too great as well: it took c. 19 h.
>>>>        
>>>>
>>to
>>    
>>
>>>>index 13000 files (c. 3.5Gb)
>>>>
>>>>Regards,
>>>>J.
>>>>
>>>>
>>>>
>>>>        
>>>>
>>>On the specific problem of the "dead loop", I reported an instance of
>>>this to Ben a week or so ago and he has fixed it in the latest
>>>nightlies.  I expect an official release will include this bugfix soon.
>>>The file in question was unreadable with any PDF software I have, but
>>>someone managed to create it somehow...
>>>
>>>http://sourceforge.net/tracker/index.php?func=detail&aid=1037145&group_id=78314&atid=552832
>>>
>>>I've found pdfbox to be pretty good. The only time I get problems is
>>>with corrupted or egregiously bad PDF files.
>>>
>>>B.
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>>
>>>
>>>
>>>      
>>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
>>
>>
>>    
>>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message