lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From payo <pay...@yahoo.com>
Subject Index remotely documents
Date Wed, 12 Sep 2007 16:27:23 GMT

Hi to all

how i can index remotely documents(PDF, HTML, XML)?

i use lucene 2.0.0

i use current

java org.w3c.tidy.Tidy -m *.html to parser HTML

java org.apache.lucene.demo.IndexHTML -create -index index .\   for index
HTML

java org.pdfbox.searchengine.lucene.IndexFiles -create -index
C:\tomcat\webapps\luceneweb\index .\ for index PDF

but how i can parser XML?

i use 

java dom.DOMFilter *.xml

but how i can index XML


thanks
-- 
View this message in context: http://www.nabble.com/Index-remotely-documents-tf4430491.html#a12639240
Sent from the Lucene - General mailing list archive at Nabble.com.


Mime
View raw message