lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Catalin Mititelu <>
Subject Re: XML parsing using Lucene in Java
Date Mon, 19 Nov 2007 08:31:47 GMT
Hi Fayyaz,
I recommend to use SAX or, maybe, a custom parser for large xml files .It should be faster
than using Digester. The main difference between those xml parsers is that Digester needs
to load the entire xml document in memory when it creates those objects, meanwhile you can
parse the document and add its content in Lucene index on the fly using SAX. On the other
hand with Digester the documents are parsed twice: once to transform the xml to Digester object
and second you should use this object to add its content to Lucene index.
Digester is very good for small documents and if you don't want to worry about the xml parsing
A custom parser maybe is the best solution if you want to have best performances. I chose
this solution.


----- Original Message ----
From: syedfa <>
Sent: Monday, November 19, 2007 5:43:28 AM
Subject: XML parsing using Lucene in Java

Dear Fellow Lucene Developers:

I am a java/jsp developer and have started learning lucene for the
of creating a search engine for some books that I have in xml format.
XML document is actually quite large, and would like to provide as
results as possible to the user searching through these books.  My
is, which xml parser do you recommend using, SAX or Digester?  Is there
difference?  Does one parser provide better results than the other?
about performance issues?

Any help that you can provide is greatly appreciated.  I look forward
hearing from you soon.

Take care.

View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

Never miss a thing.  Make Yahoo your home page.
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message