lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Diviacco <patrick.divia...@gmail.com>
Subject Re: should I import the XML file into a mysql dataset ?
Date Tue, 29 Mar 2011 09:49:13 GMT
1 - I'm using commons Digester as xml parser, how can I find the bottleneck
? Should I run the code and comment out the Lucene queries part and just
leave the xml parsing ?

2 - I actually also wanted to know the following: how much does it take to
run a 100MB queries text file against each single document of a 100MB
collection ? On a Intel Dual Duo Core with 4GB Ram ? Are we talking about
few hours ? Can I have an estimate ?

thanks



On 29 March 2011 11:43, Ian Lea <ian.lea@gmail.com> wrote:

> You need to figure out what is taking the time, for example by reading
> the XML file without making any lucene queries.  What XML parsing
> process are you using?  Some are faster than others.  A google search
> should find loads of info.
>
> If it turns out that it is lucene searching taking most of the time,
> see http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
>
>
> But do the figuring out first - there is little point in speeding up
> the bit that is already quick.
>
>
> --
> Ian.
>
>
> On Tue, Mar 29, 2011 at 10:22 AM, Patrick Diviacco
> <patrick.diviacco@gmail.com> wrote:
> > hi,
> >
> > I performing multiple queries (stored in a 100MB XML file) against a
> > collection (indexed with lucene, and it was stored before in a 100MB XML
> > file).
> >
> > The process seems pretty long on my machine (more than 2 hours), so I was
> > wondering if importing the 100MB queries XML file into a mysql dataset
> and
> > extract them with Java would dramatically improve the performances
> (rather
> > than working with Java + a xml text file).
> >
> > thanks
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message