lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Noll <dan...@nuix.com>
Subject Re: Big size xml file indexing
Date Mon, 22 Jan 2007 23:00:56 GMT
aslam bari wrote:
> Hi Saikrishna, Unluckily my xml structure is not the same, some times
> it goes too long and some times too small on nodes. It may be one
> element go throught the whole document or there may be many elements
> of different types come. So need your help on it how to parse in good
> and efficient way so that less memory use and fast processing.

You haven't given many details on what you expect the output to be, but 
assuming you mean something like this:

     XML document => Lucene document with one field

What I would do is implement a Reader subclass which parses the XML 
document on the fly, perhaps using the StAX API, or some similar 
stream-based parser API.  That way, no matter how large the document is, 
it will only take up as much memory as you choose to store.

For multiple fields, things get a little more interesting.  And if you 
expect the XML document to contain multiple Lucene documents, then 
things get more interesting again.

Daniel

-- 
Daniel Noll

Nuix Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia    Ph: +61 2 9280 0699
Web: http://nuix.com/                               Fax: +61 2 9212 6902

This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message