lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ogren, Philip V." <>
Subject parsing XML
Date Thu, 29 Nov 2001 16:37:54 GMT

I didn't pour through the archive to make sure no one had done this yet
I have a generic way of indexing XML that I think is really useful.
Basically, I implement the DefaultHandler (in SAX) that handles XML
documents that look like something like this:
	<field name="myfield1" store="true" index="true" token="true">a
small field</field>
	<field name="myfield2" store="false" index="true" token="true">a
large field</field>

I haven't actually written a DTD or schema because I haven't needed one
yet.*  I create a org.apache.lucene.document.Field for each 'field' tag that
is processed.  The way I get an XML document that conforms to this very
simplistic schema is through XSLT.  You simply create a style sheet that
transforms your specific xml document into xml that conforms with the above
tags.  It's proven very useful on our project because changing the way an
xml document is indexed requires no change in the code - I simply change my
style sheet and reindex.  

I would be willing to cut a version of this code that would be suitable for
a demonstration - along with a demo -  if there is any interest.  

Philip Ogren

*I originally had a 'datefield' tag as well but I found the DateField class
to be useless for my application as it doesn't handle dates before 1970.

> Philip V. Ogren
> Medical Information Resources
> Mayo Clinic Rochester
> (507) 538-0167

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message