lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Taylor" <richard.tay...@newscientist.com>
Subject Re: Indexing multiple equiv fieldnames (revisited)
Date Wed, 22 May 2002 14:56:59 GMT
We're also creating multiple lucene Documents (Article) from one XML
document (ArticleCollection).

Our XML documents are quite large and we have a large quantity of them so we
use a SAX parser to process the XML and create multiple lucene Documents in
the index along the way - this is very much faster than using any DOM parser
to do the same job.

The code we're using to create the index is based on one in the
contributions directory for lucene:

http://jakarta.apache.org/lucene/docs/contributions.html

Its the one called XMLDocument #1 which links to:

http://marc.theaimsgroup.com/?l=lucene-dev&m=100723333506246&w=2

Richard Taylor
New Scientist Developer

----- Original Message -----
From: "Peter Carlson" <carlson@bookandhammer.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Wednesday, May 22, 2002 3:46 PM
Subject: Re: Indexing multiple equiv fieldnames (revisited)


>
> You should have any significant performance change.
>
> In order to do this, use xalan and get a NodeIterator based on a given
> xpath. I think there is an example of this on the xalan site
> (xml.apache.org). I use this same technique to create mulitple physical
> documents from and xml file that has many top level elements.
>
> --Peter
>
> On 5/21/02 8:33 PM, "Mohamed Idrees" <idrees@planet-connections.net>
wrote:
>
> > Can someone clarify me the following:
> >
> > Will there be any significant change in performance, if one was to
> > create individual
> > lucene documents for each of the main element (customer in this case) in
> > one single xml file?
> >
> > Wish someone comes out with contribution to avoid this.
> >
>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message