lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Susanto <>
Subject Re: Indexing Complex XML
Date Sat, 18 Apr 2009 18:09:24 GMT
Thanks Erick,

In more complex xml I mean, for example this xml:

<title>Lucene Book</title>
<author>Book author 1</author>
<author>Book author 2</author>
<summary>Book for Lucene</summary>

<title>Lucene Book 2</title>


<author>Book 2 author 1</author>

<author>Book 2 author 2</author>


<summary>Book 2 for Lucene</summary>


for each 'book' node is handled by one Document rite? and now
how to handle the 'authors' node? should I put in new Document? or how?

thx. :)
Daniel Susanto

--- On Sun, 4/19/09, Erick Erickson <> wrote:

From: Erick Erickson <>
Subject: Re: Indexing Complex XML
Date: Sunday, April 19, 2009, 12:01 AM

Lucene is an *engine*, not an application. *You* have to process the
XML, decide what the structure of your index is and index the data. There
are many
XML parser options, this is just straight Java code. You'll decide
what's relevant, and add the contents of the relevant elements to a Lucene
then add that to your index.

Similarly for searching.

So, say you have the following simple XML doc
   <ele1>ele 1 text</ele1>
   <ele2>ele 2 text</ele2>

You'd have to parse that text, then, say, add (semi-pseudo-code)
Document doc = new Document()
doc.add(new Field("ele1field", "ele 1 text", StoreOPtion, IndexOption);
doc.add(new Field("ele2field", "ele 2 text", StoreOption, IndexOption);

Then at search time you'd form your queries on "ele1field" and ele2field".


On Sat, Apr 18, 2009 at 11:19 AM, daniel susanto <>wrote:

> Hi,
> I need advise or example to index complex XML file, I mean the XML note
> just in one level node but more than one. for example indexing rss or atom.
> thx b4.
> Daniel Susanto

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message