lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Catalin Mititelu <catalinmitit...@yahoo.com>
Subject Re: XML parsing using Lucene in Java
Date Mon, 19 Nov 2007 08:31:47 GMT
Hi Fayyaz,
I recommend to use SAX or, maybe, a custom parser for large xml files .It should be faster
than using Digester. The main difference between those xml parsers is that Digester needs
to load the entire xml document in memory when it creates those objects, meanwhile you can
parse the document and add its content in Lucene index on the fly using SAX. On the other
hand with Digester the documents are parsed twice: once to transform the xml to Digester object
and second you should use this object to add its content to Lucene index.
Digester is very good for small documents and if you don't want to worry about the xml parsing
problems.
A custom parser maybe is the best solution if you want to have best performances. I chose
this solution.

Regards,
Catalin

----- Original Message ----
From: syedfa <fayyazuddin@gmail.com>
To: java-user@lucene.apache.org
Sent: Monday, November 19, 2007 5:43:28 AM
Subject: XML parsing using Lucene in Java


Dear Fellow Lucene Developers:

I am a java/jsp developer and have started learning lucene for the
 purpose
of creating a search engine for some books that I have in xml format.
  The
XML document is actually quite large, and would like to provide as
 accurate
results as possible to the user searching through these books.  My
 question
is, which xml parser do you recommend using, SAX or Digester?  Is there
 a
difference?  Does one parser provide better results than the other?
  What
about performance issues?

Any help that you can provide is greatly appreciated.  I look forward
 to
hearing from you soon.

Take care.
Sincerely;
Fayyaz

-- 
View this message in context:
 http://www.nabble.com/XML-parsing-using-Lucene-in-Java-tf4833124.html#a13827336
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org







      ____________________________________________________________________________________
Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message