lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhou, Oliver" <Oliver.Z...@cignabehavioral.com>
Subject RE: Lucene refresh index function (incremental indexing).
Date Tue, 25 Nov 2003 16:03:03 GMT
I do have other problems with PDFBox-0.6.4.  For one, it has annoying debug
information at very low level parsing process.  The other, I got infinite
loop while indexing pdf files although they say the infinite loop bug has
been fixed in their release notes.  Anybody knows what's going on?

Thanks,
Oliver

 

-----Original Message-----
From: Ben Litchfield [mailto:ben@csh.rit.edu]
Sent: Tuesday, November 25, 2003 9:45 AM
To: Lucene Users List
Subject: RE: Lucene refresh index function (incremental indexing).



Yes, just add the log4j configuration.  The easiest way to do that is as a
system parameter like this

java -Dlog4j.configuration=log4j.xml org.apache.lucene.demo.IndexHTML
-create -index c:\\index ..

Where log4j.xml is the path to your log4j config, PDFBox has an example
one you can use.

Ben
http://www.pdfbox.org

On Tue, 25 Nov 2003, Zhou, Oliver wrote:

> Lucene doesn't have pdf parser.  In order to index pdf files you have to
add
> one by your self.  PDFBox is a good choice.  You may just ignore the
warning
> for log4j or you can add log4j in your classpath.
>
> Oliver
>
>
> -----Original Message-----
> From: Tun Lin [mailto:chentun@singnet.com.sg]
> Sent: Monday, November 24, 2003 10:07 PM
> To: 'Lucene Users List'
> Subject: RE: Lucene refresh index function (incremental indexing).
>
>
> Does it support indexing the contents of pdf files? I have found one
project
> called PDFBox that can be integrated with Lucene to search inside of the
pdf
> files. Currently, Lucene can only search for the pdf filename. I tried
with
> PDFBox and I got the following message when I typed the command: java
> org.apache.lucene.demo.IndexHTML -create -index c:\\index ..
>
> log4j:WARN No appenders could be found for logger
> (org.pdfbox.pdfparser.PDFParse
> r).
> log4j:WARN Please initialize the log4j system properly.
>
> Can anyone advise?
>
> -----Original Message-----
> From: Doug Cutting [mailto:cutting@lucene.com]
> Sent: Tuesday, November 25, 2003 5:01 AM
> To: Lucene Users List
> Subject: Re: Lucene refresh index function (incremental indexing).
>
> Tun Lin wrote:
> > These are the steps I took:
> >
> > 1) I compile all the files in a particular directory using the command:
> > java org.apache.lucene.demo.IndexHTML -create -index c:\\index ..
> > , putting all the indexed files in c:\\index.
> > 2) Everytime, I added an additional file in that directory. I need to
> > reindex/recompile that directory to generate the indexes again. As the
> > directory gets larger, the indexing takes a longer time.
> >
> > My question is how do I generate the indexes automatically everytime a
> > new document is added in that directory without me recompiling everytime
> manually?
>
> To update, try removing the '-create' from the command line.  The demo
code
> supports incremental updates.  It will re-scan the directory and figure
out
> which files have changed, what new files have appeared and which
previously
> existing files have been removed.
>
> Doug
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message