lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: Indexing Complex XML
Date Sun, 19 Apr 2009 19:05:53 GMT
try vtd-xml 
it works with any XML regardless of complexity 
----- Original Message ----- 
From: "Digy" <> 
Sent: Saturday, April 18, 2009 12:25:21 PM GMT -08:00 US/Canada Pacific 
Subject: RE: Indexing Complex XML 

doc.add(new Field("authors", "name1 surname1 name2 surmane2", StoreOption, 

So you can make a search like 
authors:"name1 surname1" 

(Disadvantage: you will also get result with a search like authors:"surname1 
name2" ) 

-----Original Message----- 
From: Daniel Susanto [] 
Sent: Saturday, April 18, 2009 9:09 PM 
Subject: Re: Indexing Complex XML 

Thanks Erick, 

In more complex xml I mean, for example this xml: 

<title>Lucene Book</title> 
<author>Book author 1</author> 
<author>Book author 2</author> 
<summary>Book for Lucene</summary> 

<title>Lucene Book 2</title> 


<author>Book 2 author 1</author> 

<author>Book 2 author 2</author> 


<summary>Book 2 for Lucene</summary> 


for each 'book' node is handled by one Document rite? and now 
how to handle the 'authors' node? should I put in new Document? or how? 

thx. :) 
Daniel Susanto 

--- On Sun, 4/19/09, Erick Erickson <> wrote: 

From: Erick Erickson <> 
Subject: Re: Indexing Complex XML 
Date: Sunday, April 19, 2009, 12:01 AM 

Lucene is an *engine*, not an application. *You* have to process the 
XML, decide what the structure of your index is and index the data. There 
are many 
XML parser options, this is just straight Java code. You'll decide 
what's relevant, and add the contents of the relevant elements to a Lucene 
then add that to your index. 

Similarly for searching. 

So, say you have the following simple XML doc 
<ele1>ele 1 text</ele1> 
<ele2>ele 2 text</ele2> 

You'd have to parse that text, then, say, add (semi-pseudo-code) 
Document doc = new Document() 
doc.add(new Field("ele1field", "ele 1 text", StoreOPtion, IndexOption); 
doc.add(new Field("ele2field", "ele 2 text", StoreOption, IndexOption); 

Then at search time you'd form your queries on "ele1field" and ele2field". 


On Sat, Apr 18, 2009 at 11:19 AM, daniel susanto 

> Hi, 
> I need advise or example to index complex XML file, I mean the XML note 
> just in one level node but more than one. for example indexing rss or 
> thx b4. 
> Daniel Susanto 

To unsubscribe, e-mail: 
For additional commands, e-mail: 

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message