lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Linwood <j...@greenninja.com>
Subject Re: Indexing XML with Lucene
Date Fri, 14 Feb 2003 14:38:45 GMT
Hi,

To use Lucene, you will have to have some way of creating Lucene 
Document objects, which you can then add to a Lucene index.

Most of the translation work is custom, because everyone's got different 
XML DTD's and schemas.  There are examples of indexing XML with DOM and 
SAX for Lucene in the Lucene Sandbox.  

There are probably a few steps you will need to take:

1) Figure out how your XML Schema or DTD maps to a Lucene Document - 
which XML elements are going to be which Lucene Fields, and how are they 
going to be indexed and stored?
2) Write a JUnit test that will be used to test your document conversion 
utility.  It should take your XML documents, run them through your 
converter, and then check the fields in the Lucene document to make sure 
they are what you want. Do this for each type of XML document that you have.
3) Write conversion code that translates an XML document to a Lucene 
document. Do this for each type of XML document that you have.
4) Write an indexer utility that goes through your XML database and 
feeds XML documents throught the conversion utility and then into the 
Lucene indexer.

I might be forgetting one or two steps here :)

Jeff

Pierre Lacchini wrote:

>Hello,
>
>I'm using Lucene, and I need to index an XML Database (Tamino).
>How can I do that ? Do i have to use an XML parser as Digester ?
>
>I'm kinda noob with Lucene, and I really need help ;)
>
>Thx !Pierre Lacchini
>Consultant développement
>
>PeopleWare
>12, rue du Cimetière
>L-8413 Steinfort
>Phone : + 352 399 968 35
>http://www.peopleware.lu
>
>
>
>  
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message