# lucene-java-user mailing list archives

##### Site index · List index
Message view
Top
From "Santosh" <santos...@softprosys.com>
Subject Fw: pdfboxhelp
Date Mon, 23 Aug 2004 05:20:47 GMT
```hi karthik,
did u find any solution? should I send the pdf to u?
----- Original Message -----
From: "Santosh" <santosh.s@softprosys.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Monday, August 23, 2004 10:23 AM
Subject: Re: pdfboxhelp

> hi karthik,
>  I kept log4j in the classpath , I am sending classpath variable
>
> CLASSPATH
>
>
.;..;C:\j2sdk1.4.1\lib;C:\j2sdk1.4.1\lib\jndi.jar;C:\j2sdk1.4.1\lib\webclien
>
t.jar;C:\j2sdk1.4.1\lib\mail.jar;C:\j2sdk1.4.1\lib\activation.jar;C:\j2sdk1.
>
4.1\lib\xml-apis.jar;D:\JAVAPRO;C:\j2sdk1.4.1\jre\lib\ext\msbase.jar;C:\j2sd
> k1.4.1\lib\servlet.jar;E:\Program Files\Apache Tomcat
> 4.0\common\lib\servlet.jar;C:\Program
>
Files\Altova\xmlspy\XMLSpyInterface.jar;C:\j2sdk1.4.1\lib\sax.jar;C:\j2sdk1.
>
4.1\lib\dom.jar;C:\j2sdk1.4.1\lib\xalan.jar;C:\j2sdk1.4.1\lib\xercesImpl.jar
>
;C:\j2sdk1.4.1\lib\xmlParserAPIs.jar;C:\j2sdk1.4.1\lib\parser.jar;C:\j2sdk1.
>
4.1\lib\jaxp.jar;C:\j2sdk1.4.1\lib\xml.jar;C:\j2sdk1.4.1\lib\classes12.zip;C
>
:\struts.jar;F:\apache-ant-1.6.1\lib\ant.jar;C:\j2sdk1.4.1\lib\PDFBox-0.6.6.
>
jar;C:\j2sdk1.4.1\lib\lucene-20030909.jar;D:\setups\searchEngine\PDFBox-0.6.
> 6\external\log4j.jar
>
>
>
>
> ----- Original Message -----
> From: "Karthik N S" <karthik@controlnet.co.in>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Monday, August 23, 2004 10:26 AM
> Subject: RE: pdfboxhelp
>
>
> > Hi Santosh
> >
> >   I think u'r Pdf is using  Log4j package ,Try toe set the classpath for
> > log4j.jar path.
> >
> >  [ Is it a just a WARNING  or an ERROR  u are getting.
> >
> >   Send me in u'r Configuration management Let me help u with it.... ; [
> >
> >
> > Karthik
> >
> > -----Original Message-----
> > From: Santosh [mailto:santosh.s@softprosys.com]
> > Sent: Monday, August 23, 2004 10:11 AM
> > To: Lucene Users List
> > Cc: Ben Litchfield
> > Subject: Re: pdfboxhelp
> >
> >
> > hi karthik,
> >
> > I have downloaded pdfbox and kept pdfjar file in the classpath, but when
I
> > am typing following command in the command prompt I am getting the
error:
> >
> > D:\setups\searchEngine\PDFBox-0.6.6\src>java org.pdfbox.ExtractText
> > C:\test.pdf
> > C:\test.txt
> > log4j:WARN No appenders could be found for logger
> > (org.pdfbox.pdfparser.PDFParse
> > r).
> > log4j:WARN Please initialize the log4j system properly
> >
> > why I am getting this error? plz help
> >
> >
> > ----- Original Message -----
> > From: "Karthik N S" <karthik@controlnet.co.in>
> > To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> > Sent: Monday, August 23, 2004 9:21 AM
> > Subject: RE: pdfboxhelp
> >
> >
> > > Hi
> > >
> > >
> > >     To Begin with try to build Indexes offline  [ out of Tomcat
> container]
> > > and  on completing indxexes, feed u'r search  with the realpath of the
> > offline indexed folder,Start the Tomcat and then use the
> > > search on.... As u experiment it out u will be comfortable
> withrequirment
> > of Indexing /Search......       ; [
> > >
> > > Karthik
> > >
> > > -----Original Message-----
> > > From: Santosh [mailto:santosh.s@softprosys.com]
> > > Sent: Saturday, August 21, 2004 4:55 PM
> > > To: Lucene Users List
> > > Subject: Re: pdfboxhelp
> > >
> > >
> > > Yes I did the same.
> > > I copied all the classes into classes folder but
> > > now when I am building the index using IndexHTML the pdfs are not
> to
> > > this index, only text and htmls are added to index.
> > > what changes should I do for IndexHTML.java to build index with pdf
> > > ----- Original Message -----
> > > From: "Karthik N S" <karthik@controlnet.co.in>
> > > To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> > > Sent: Saturday, August 21, 2004 4:54 PM
> > > Subject: RE: pdfboxhelp
> > >
> > >
> > > > Hi
> > > >
> > > > If u are using the jar file with Web Interface for jsp/servlet dev,
> > Place
> > > > the jar file in  "webapps/<u'rapplication>/<Web-inf>/lib"
> > > > and also correct the Classpath for the present modification.
> > > >
> > > > 2)create u'r own package and put all u'r java files  copy the java
> files
> > > to
> > > > /Web-inf/Classes/<u'r package>
> > > >
> > > >
> > > > Then use the same......;{
> > > >
> > > >
> > > > Karthik
> > > >
> > > > -----Original Message-----
> > > > From: Santosh [mailto:santosh.s@softprosys.com]
> > > > Sent: Saturday, August 21, 2004 4:31 PM
> > > > To: Lucene Users List
> > > > Subject: Re: pdfboxhelp
> > > >
> > > >
> > > > thanks  Natarajan and karthik,
> > > >
> > > > I corrected classpath
> > > >
> > > > but where I should write your code?
> > > > should I write your code in IndexHTML.java  which comes along with
> > lucene
> > > or
> > > > some other place?
> > > > one more thing
> > > > I kept pdfbox jar file in the classpath is this enough or I have to
> > build
> > > > the pdfbox?
> > > >
> > > > thankyou
> > > > ----- Original Message -----
> > > > From: "Natarajan.T" <natarajant@crimsonlogic.co.in>
> > > > To: "'Lucene Users List'" <lucene-user@jakarta.apache.org>
> > > > Sent: Saturday, August 21, 2004 3:20 PM
> > > > Subject: RE: pdfboxhelp
> > > >
> > > >
> > > > > Hi Santhosh,
> > > > >
> > > > > Try out this below code.....(pdfbox.jar file must be in your
> > classpath)
> > > > >
> > > > > public String getContent(InputStream  reader) throws
> > > IOException{PDFParser
> > > > parser = null;PDDocument pdDoc = null;PDFTextStripper stripper =
> > > null;String
> > > > pdftext = "";try{parser = new PDFParser(reader);parser.parse();pdDoc
=
> > > > parser.getPDDocument();if(pdDoc.isEncrypted()){DecryptDocument
> decryptor
> > =
> > > > new
> > > > > DecryptDocument(pdDoc);decryptor.decryptDocument("");}stripper =
new
> > > > PDFTextStripper();pdftext = stripper.getText(pdDoc);
> > > > >
> > > > >        info = pdDoc.getDocumentInformation();}catch(Exception err)
> > > > {System.out.println(err.getMessage());}pdDoc.close();return
pdftext;}
> > > > >
> > > > > Natarajan.
> > > > >
> > > > > -----Original Message-----
> > > > > From: Santosh [mailto:santosh.s@softprosys.com]
> > > > > Sent: Saturday, August 21, 2004 3:14 PM
> > > > > To: Lucene Users List
> > > > > Subject: Re: pdfboxhelp
> > > > >
> > > > > Hi Don,
> > > > >
> > > > > your Idea is nice, but whenever I write the  following code in
> > > > > IndexHTML.java of lucene
> > > > >
> > > > >
> > > > > import org.pdfbox.searchengine.lucene.*;
> > > > >
> > > > > File pdfFile = new File("/path/to/the/file.pdf");
> > > > >
> > > > > // Below returns a parse PDF file in a Lucene Document object.
> > > > > Document doc = LucenePDFDocument.getDocument(pdfFile);
> > > > >
> > > > > Iam getting the following error
> > > > >
> > > > > package org.pdfbox.searchengine.lucene does not exist
> > > > >
> > > > > I have downloaded pdfbox source code and kept the jar file in the
From:
> > Don
> > > > Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004
7:37
> > > > PMSubject: Re: pdfboxhelp
> > > > >
> > > > >
> > > > >   Here is the super simple code required.
> > > > >
> > > > >   import org.pdfbox.searchengine.lucene.*;
> > > > >
> > > > >   File pdfFile = new File("/path/to/the/file.pdf");
> > > > >
> > > > >   // Below returns a parse PDF file in a Lucene Document
> > object.Document
> > > > doc = LucenePDFDocument.getDocument(pdfFile);
> > > > >
> > > > >                   Santosh wrote:
> > > > >
> > > > > exactly, the same is required to me----- Original Message -----
> From:
> > > Don
> > > > Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004
6:39
> > > > PMSubject: Re: pdfboxhelp
> > > > >
> > > > >
> > > > >   What are your intensions with PDFBox?
> > > > >
> > > > >   You want to use it to index PDF files?
> > > > >
> > > > >   Santosh wrote:
> > > > >
> > > > > hi,
> > > > >
> > > > > I have downloaded pdfbox zip. but i am in ambigous state that
where
> to
> > > > > start. how can I check with demo, I dont see any help document
with
> > this
> > > > >
> > > > >
> > > > > regards
> > > > > Santosh kumar
> > > > > SoftPro Systems
> > > > > Hyderabad
> > > > >
> > > > >
> > > > > "The harder you train in peace, the lesser you bleed in war"
> > > > >
> > > > > -----------------------SOFTPRO
> > DISCLAIMER------------------------------
> > > > >
> > > > > Information contained in this E-MAIL and any attachments are
> > > > > confidential being  proprietary to SOFTPRO SYSTEMS  is
'privileged'
> > > > > and 'confidential'.
> > > > >
> > > > > If you are not an intended or authorised recipient of this E-MAIL
or
> > > > > have received it in error, You are notified that any use, copying
or
> > > > > dissemination  of the information contained in this E-MAIL in any
> > > > > manner whatsoever is strictly prohibited. Please delete it
> immediately
> > > > > and notify the sender by E-MAIL.
> > > > >
> > > > > In such a case reading, reproducing, printing or further
> dissemination
> > > > > of this E-MAIL is strictly prohibited and may be unlawful.
> > > > >
> > > > > SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
> > > > > hereto is free from computer viruses or other defects.
> > > > >
> > > > > The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
> > > > > those of the author and are not necessarily those of SOFTPRO
> SYSTEMS.
> > > >
> >
> ------------------------------------------------------------------------
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >   -- Don VaillancourtDirector of Software Development
> > > > >
> > > > >   WEB IMPACT INC.phone: 416-815-2000 ext. 245fax:
416-815-2001email:
> > > > donv@web-impact.comweb: http://www.web-impact.com
> > > > >
> > > > >
> > > > >
> > > > >   This email message is intended only for the addressee(s)and
> contains
> > > > information that may be confidential and/orcopyright. If you are not
> the
> > > > intended recipient pleasenotify the sender by reply email and
> > immediately
> > > > deletethis email. Use, disclosure or reproduction of this emailby
> anyone
> > > > other than the intended recipient(s) is strictlyprohibited. No
> > > > representation is made that this email orany attachments are free of
> > > > viruses. Virus scanning isrecommended and is the responsibility of
the
> > > > recipient.
> > > > >
> > > > >
> > > > >
> > > > > -----------------------SOFTPRO
> > DISCLAIMER------------------------------
> > > > >
> > > > > Information contained in this E-MAIL and any attachments are
> > > > > confidential being  proprietary to SOFTPRO SYSTEMS  is
'privileged'
> > > > > and 'confidential'.
> > > > >
> > > > > If you are not an intended or authorised recipient of this E-MAIL
or
> > > > > have received it in error, You are notified that any use, copying
or
> > > > > dissemination  of the information contained in this E-MAIL in any
> > > > > manner whatsoever is strictly prohibited. Please delete it
> immediately
> > > > > and notify the sender by E-MAIL.
> > > > >
> > > > > In such a case reading, reproducing, printing or further
> dissemination
> > > > > of this E-MAIL is strictly prohibited and may be unlawful.
> > > > >
> > > > > SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
> > > > > hereto is free from computer viruses or other defects.
> > > > >
> > > > > The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
> > > > > those of the author and are not necessarily those of SOFTPRO
> SYSTEMS.
> > > >
> >
> ------------------------------------------------------------------------
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> >
> ------------------------------------------------------------------------
> > > > > ------
> > > > >
> > > > >
> > >
>---------------------------------------------------------------------To
> > > > unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.orgFor
> > > > additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > > > >
> > > > > -----------------------SOFTPRO
> > DISCLAIMER------------------------------
> > > > >
> > > > > Information contained in this E-MAIL and any attachments are
> > > > > confidential being  proprietary to SOFTPRO SYSTEMS  is
'privileged'
> > > > > and 'confidential'.
> > > > >
> > > > > If you are not an intended or authorised recipient of this E-MAIL
or
> > > > > have received it in error, You are notified that any use, copying
or
> > > > > dissemination  of the information contained in this E-MAIL in any
> > > > > manner whatsoever is strictly prohibited. Please delete it
> immediately
> > > > > and notify the sender by E-MAIL.
> > > > >
> > > > > In such a case reading, reproducing, printing or further
> dissemination
> > > > > of this E-MAIL is strictly prohibited and may be unlawful.
> > > > >
> > > > > SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
> > > > > hereto is free from computer viruses or other defects.
> > > > >
> > > > > The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
> > > > > those of the author and are not necessarily those of SOFTPRO
> SYSTEMS.
> > > >
> >
> ------------------------------------------------------------------------
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >   -- Don VaillancourtDirector of Software Development
> > > > >
> > > > >   WEB IMPACT INC.phone: 416-815-2000 ext. 245fax:
416-815-2001email:
> > > > donv@web-impact.comweb: http://www.web-impact.com
> > > > >
> > > > >
> > > > >
> > > > >   This email message is intended only for the addressee(s)and
> contains
> > > > information that may be confidential and/orcopyright. If you are not
> the
> > > > intended recipient pleasenotify the sender by reply email and
> > immediately
> > > > deletethis email. Use, disclosure or reproduction of this emailby
> anyone
> > > > other than the intended recipient(s) is strictlyprohibited. No
> > > > representation is made that this email orany attachments are free of
> > > > viruses. Virus scanning isrecommended and is the responsibility of
the
> > > > recipient.
> > > > >
> > > > >
> > > > >
> > > > > -----------------------SOFTPRO
> > DISCLAIMER------------------------------
> > > > >
> > > > > Information contained in this E-MAIL and any attachments are
> > > > > confidential being  proprietary to SOFTPRO SYSTEMS  is
'privileged'
> > > > > and 'confidential'.
> > > > >
> > > > > If you are not an intended or authorised recipient of this E-MAIL
or
> > > > > have received it in error, You are notified that any use, copying
or
> > > > > dissemination  of the information contained in this E-MAIL in any
> > > > > manner whatsoever is strictly prohibited. Please delete it
> immediately
> > > > > and notify the sender by E-MAIL.
> > > > >
> > > > > In such a case reading, reproducing, printing or further
> dissemination
> > > > > of this E-MAIL is strictly prohibited and may be unlawful.
> > > > >
> > > > > SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
> > > > > hereto is free from computer viruses or other defects.
> > > > >
> > > > > The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
> > > > > those of the author and are not necessarily those of SOFTPRO
> SYSTEMS.
> > > >
> >
> ------------------------------------------------------------------------
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> >
> ------------------------------------------------------------------------
> > > > > ------
> > > > >
> > > > >
> > >
>---------------------------------------------------------------------To
> > > > unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.orgFor
> > > > additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > > > >
> > > > > -----------------------SOFTPRO
> > DISCLAIMER------------------------------
> > > > >
> > > > > Information contained in this E-MAIL and any attachments are
> > > > > confidential being  proprietary to SOFTPRO SYSTEMS  is
'privileged'
> > > > > and 'confidential'.
> > > > >
> > > > > If you are not an intended or authorised recipient of this E-MAIL
or
> > > > > have received it in error, You are notified that any use, copying
or
> > > > > dissemination  of the information contained in this E-MAIL in any
> > > > > manner whatsoever is strictly prohibited. Please delete it
> immediately
> > > > > and notify the sender by E-MAIL.
> > > > >
> > > > > In such a case reading, reproducing, printing or further
> dissemination
> > > > > of this E-MAIL is strictly prohibited and may be unlawful.
> > > > >
> > > > > SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
> > > > > hereto is free from computer viruses or other defects.
> > > > >
> > > > > The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
> > > > > those of the author and are not necessarily those of SOFTPRO
> SYSTEMS.
> > > >
> >
> ------------------------------------------------------------------------
> > > > >
> > > > >
> > > >
> > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > > > > For additional commands, e-mail:
lucene-user-help@jakarta.apache.org
> > > > >
> > > >
> > > >
> > >
> ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > > >
> > > >
> > >
> ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org