Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 52354 invoked from network); 23 Aug 2004 03:40:31 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 23 Aug 2004 03:40:31 -0000 Received: (qmail 67322 invoked by uid 500); 23 Aug 2004 03:40:23 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 67291 invoked by uid 500); 23 Aug 2004 03:40:22 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 67275 invoked by uid 99); 23 Aug 2004 03:40:22 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [203.199.26.74] (HELO daakghar.controlnet.co.in) (203.199.26.74) by apache.org (qpsmtpd/0.27.1) with SMTP; Sun, 22 Aug 2004 20:40:17 -0700 Received: from karthik ([192.168.4.1]) by dakiya.controlnet.co.in (Netscape Messaging Server 4.15) with ESMTP id I2VS5D00.F7L for ; Mon, 23 Aug 2004 09:23:37 +0530 From: "Karthik N S" To: "Lucene Users List" Subject: RE: pdfboxhelp Date: Mon, 23 Aug 2004 09:21:40 +0530 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Importance: Normal In-Reply-To: <018f01c48771$8c401280$4801a8c0@sprosys.com> X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Hi To Begin with try to build Indexes offline [ out of Tomcat container] and on completing indxexes, feed u'r search with the real path of the offline indexed folder,Start the Tomcat and then use the search on.... As u experiment it out u will be comfortable with requirment of Indexing /Search...... ; [ Karthik -----Original Message----- From: Santosh [mailto:santosh.s@softprosys.com] Sent: Saturday, August 21, 2004 4:55 PM To: Lucene Users List Subject: Re: pdfboxhelp Yes I did the same. I copied all the classes into classes folder but now when I am building the index using IndexHTML the pdfs are not added to this index, only text and htmls are added to index. what changes should I do for IndexHTML.java to build index with pdf ----- Original Message ----- From: "Karthik N S" To: "Lucene Users List" Sent: Saturday, August 21, 2004 4:54 PM Subject: RE: pdfboxhelp > Hi > > If u are using the jar file with Web Interface for jsp/servlet dev, Place > the jar file in "webapps///lib" > and also correct the Classpath for the present modification. > > 2)create u'r own package and put all u'r java files copy the java files to > /Web-inf/Classes/ > > > Then use the same......;{ > > > Karthik > > -----Original Message----- > From: Santosh [mailto:santosh.s@softprosys.com] > Sent: Saturday, August 21, 2004 4:31 PM > To: Lucene Users List > Subject: Re: pdfboxhelp > > > thanks Natarajan and karthik, > > I corrected classpath > > but where I should write your code? > should I write your code in IndexHTML.java which comes along with lucene or > some other place? > one more thing > I kept pdfbox jar file in the classpath is this enough or I have to build > the pdfbox? > > thankyou > ----- Original Message ----- > From: "Natarajan.T" > To: "'Lucene Users List'" > Sent: Saturday, August 21, 2004 3:20 PM > Subject: RE: pdfboxhelp > > > > Hi Santhosh, > > > > Try out this below code.....(pdfbox.jar file must be in your classpath) > > > > public String getContent(InputStream reader) throws IOException{PDFParser > parser = null;PDDocument pdDoc = null;PDFTextStripper stripper = null;String > pdftext = "";try{parser = new PDFParser(reader);parser.parse();pdDoc = > parser.getPDDocument();if(pdDoc.isEncrypted()){DecryptDocument decryptor = > new > > DecryptDocument(pdDoc);decryptor.decryptDocument("");}stripper = new > PDFTextStripper();pdftext = stripper.getText(pdDoc); > > > > info = pdDoc.getDocumentInformation();}catch(Exception err) > {System.out.println(err.getMessage());}pdDoc.close();return pdftext;} > > > > Natarajan. > > > > -----Original Message----- > > From: Santosh [mailto:santosh.s@softprosys.com] > > Sent: Saturday, August 21, 2004 3:14 PM > > To: Lucene Users List > > Subject: Re: pdfboxhelp > > > > Hi Don, > > > > your Idea is nice, but whenever I write the following code in > > IndexHTML.java of lucene > > > > > > import org.pdfbox.searchengine.lucene.*; > > > > File pdfFile = new File("/path/to/the/file.pdf"); > > > > // Below returns a parse PDF file in a Lucene Document object. > > Document doc = LucenePDFDocument.getDocument(pdfFile); > > > > Iam getting the following error > > > > package org.pdfbox.searchengine.lucene does not exist > > > > I have downloaded pdfbox source code and kept the jar file in the > > classpath, please help me on this----- Original Message ----- From: Don > Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 7:37 > PMSubject: Re: pdfboxhelp > > > > > > Here is the super simple code required. > > > > import org.pdfbox.searchengine.lucene.*; > > > > File pdfFile = new File("/path/to/the/file.pdf"); > > > > // Below returns a parse PDF file in a Lucene Document object.Document > doc = LucenePDFDocument.getDocument(pdfFile); > > > > Santosh wrote: > > > > exactly, the same is required to me----- Original Message ----- From: Don > Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 6:39 > PMSubject: Re: pdfboxhelp > > > > > > What are your intensions with PDFBox? > > > > You want to use it to index PDF files? > > > > Santosh wrote: > > > > hi, > > > > I have downloaded pdfbox zip. but i am in ambigous state that where to > > start. how can I check with demo, I dont see any help document with this > > download, please help me. > > > > > > regards > > Santosh kumar > > SoftPro Systems > > Hyderabad > > > > > > "The harder you train in peace, the lesser you bleed in war" > > > > -----------------------SOFTPRO DISCLAIMER------------------------------ > > > > Information contained in this E-MAIL and any attachments are > > confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' > > and 'confidential'. > > > > If you are not an intended or authorised recipient of this E-MAIL or > > have received it in error, You are notified that any use, copying or > > dissemination of the information contained in this E-MAIL in any > > manner whatsoever is strictly prohibited. Please delete it immediately > > and notify the sender by E-MAIL. > > > > In such a case reading, reproducing, printing or further dissemination > > of this E-MAIL is strictly prohibited and may be unlawful. > > > > SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment > > hereto is free from computer viruses or other defects. > > > > The opinions expressed in this E-MAIL and any ATTACHEMENTS may be > > those of the author and are not necessarily those of SOFTPRO SYSTEMS. > > ------------------------------------------------------------------------ > > > > > > > > > > > > -- Don VaillancourtDirector of Software Development > > > > WEB IMPACT INC.phone: 416-815-2000 ext. 245fax: 416-815-2001email: > donv@web-impact.comweb: http://www.web-impact.com > > > > > > > > This email message is intended only for the addressee(s)and contains > information that may be confidential and/orcopyright. If you are not the > intended recipient pleasenotify the sender by reply email and immediately > deletethis email. Use, disclosure or reproduction of this emailby anyone > other than the intended recipient(s) is strictlyprohibited. No > representation is made that this email orany attachments are free of > viruses. Virus scanning isrecommended and is the responsibility of the > recipient. > > > > > > > > -----------------------SOFTPRO DISCLAIMER------------------------------ > > > > Information contained in this E-MAIL and any attachments are > > confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' > > and 'confidential'. > > > > If you are not an intended or authorised recipient of this E-MAIL or > > have received it in error, You are notified that any use, copying or > > dissemination of the information contained in this E-MAIL in any > > manner whatsoever is strictly prohibited. Please delete it immediately > > and notify the sender by E-MAIL. > > > > In such a case reading, reproducing, printing or further dissemination > > of this E-MAIL is strictly prohibited and may be unlawful. > > > > SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment > > hereto is free from computer viruses or other defects. > > > > The opinions expressed in this E-MAIL and any ATTACHEMENTS may be > > those of the author and are not necessarily those of SOFTPRO SYSTEMS. > > ------------------------------------------------------------------------ > > > > > > > > > > > > ------------------------------------------------------------------------ > > ------ > > > > >---------------------------------------------------------------------To > unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.orgFor > additional commands, e-mail: lucene-user-help@jakarta.apache.org > > > > -----------------------SOFTPRO DISCLAIMER------------------------------ > > > > Information contained in this E-MAIL and any attachments are > > confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' > > and 'confidential'. > > > > If you are not an intended or authorised recipient of this E-MAIL or > > have received it in error, You are notified that any use, copying or > > dissemination of the information contained in this E-MAIL in any > > manner whatsoever is strictly prohibited. Please delete it immediately > > and notify the sender by E-MAIL. > > > > In such a case reading, reproducing, printing or further dissemination > > of this E-MAIL is strictly prohibited and may be unlawful. > > > > SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment > > hereto is free from computer viruses or other defects. > > > > The opinions expressed in this E-MAIL and any ATTACHEMENTS may be > > those of the author and are not necessarily those of SOFTPRO SYSTEMS. > > ------------------------------------------------------------------------ > > > > > > > > > > > > -- Don VaillancourtDirector of Software Development > > > > WEB IMPACT INC.phone: 416-815-2000 ext. 245fax: 416-815-2001email: > donv@web-impact.comweb: http://www.web-impact.com > > > > > > > > This email message is intended only for the addressee(s)and contains > information that may be confidential and/orcopyright. If you are not the > intended recipient pleasenotify the sender by reply email and immediately > deletethis email. Use, disclosure or reproduction of this emailby anyone > other than the intended recipient(s) is strictlyprohibited. No > representation is made that this email orany attachments are free of > viruses. Virus scanning isrecommended and is the responsibility of the > recipient. > > > > > > > > -----------------------SOFTPRO DISCLAIMER------------------------------ > > > > Information contained in this E-MAIL and any attachments are > > confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' > > and 'confidential'. > > > > If you are not an intended or authorised recipient of this E-MAIL or > > have received it in error, You are notified that any use, copying or > > dissemination of the information contained in this E-MAIL in any > > manner whatsoever is strictly prohibited. Please delete it immediately > > and notify the sender by E-MAIL. > > > > In such a case reading, reproducing, printing or further dissemination > > of this E-MAIL is strictly prohibited and may be unlawful. > > > > SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment > > hereto is free from computer viruses or other defects. > > > > The opinions expressed in this E-MAIL and any ATTACHEMENTS may be > > those of the author and are not necessarily those of SOFTPRO SYSTEMS. > > ------------------------------------------------------------------------ > > > > > > > > > > > > ------------------------------------------------------------------------ > > ------ > > > > >---------------------------------------------------------------------To > unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.orgFor > additional commands, e-mail: lucene-user-help@jakarta.apache.org > > > > -----------------------SOFTPRO DISCLAIMER------------------------------ > > > > Information contained in this E-MAIL and any attachments are > > confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' > > and 'confidential'. > > > > If you are not an intended or authorised recipient of this E-MAIL or > > have received it in error, You are notified that any use, copying or > > dissemination of the information contained in this E-MAIL in any > > manner whatsoever is strictly prohibited. Please delete it immediately > > and notify the sender by E-MAIL. > > > > In such a case reading, reproducing, printing or further dissemination > > of this E-MAIL is strictly prohibited and may be unlawful. > > > > SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment > > hereto is free from computer viruses or other defects. > > > > The opinions expressed in this E-MAIL and any ATTACHEMENTS may be > > those of the author and are not necessarily those of SOFTPRO SYSTEMS. > > ------------------------------------------------------------------------ > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org