lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: a single index
Date Wed, 12 Sep 2007 18:47:23 GMT

Please note that "Lucene" is a java library for building applications.  
the examples you refer to below are two applications built with the Lucene 
library -- those applications are actually just demonstrations of hte 
types of things that are possible using the Lucene library (and the PDFBox 

if you want to do more complicated things you either need to write you own 
application (you can base it off the sample code you are currently 
running) or you need to look into existing applications.

in the first case, please consult the java-user@lucene mailing list if you 
need assistence

in the second case, it may help to review this list of applications...

...based on the situation you describe however, i would think that Nutch 
may be the best place for you to start...

: I am working with lucene and i am new 
: I want to index documents HTML for this I do 
: java org.w3c.tidy.Tidy - m * html
: java org.apache.lucene.demo.IndexHTML - create - index index .\
: all this generates index to me and when doing my search in the Web if it
: shows to the documents and the summary to me.
: despues I index pdf
: org.pdfbox.searchengine.lucene.IndexFiles - create - index pdf \
: this also generates index to me
: but the index PDF replace index HTML
: how I can make him to have single index and  when doing my search in the WEB
: showme as HTML and PDF documents?
: thanks


View raw message