Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of ian.lea@gmail.com designates
 209.85.214.176 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type:content-transfer-encoding;
        b=Uz1iUYOhE9/Xw6kkEiHg+UwPBndhJPIrndX52qn/SAa9d4Pf0QU7Xo8dS/iZEGByzG
         FRYxVtVgmlXGvINKUPQZPxmuaFJ0WBzHxJXRdMDAc07OeyQdfB3x0UQX/nAjmtDFnx2o
         /E2TlhSbMCFrEwmtbnkO8Opu2VK5VU3BzXlCk=
MIME-Version: 1.0
In-Reply-To: <AANLkTikec8pH8qwiPzDeR0pYsR2rsrg0n1UQyvmt7_8x@mail.gmail.com>
References: <AANLkTik7S2dzRo1rY2bG1oAMGn41581f8vF0SLcv4bYL@mail.gmail.com>
 <BANLkTikYo1ct3Bd+0EBuy37pPnLieidwDA@mail.gmail.com>
 <AANLkTikec8pH8qwiPzDeR0pYsR2rsrg0n1UQyvmt7_8x@mail.gmail.com>
From: Ian Lea <ian.lea@gmail.com>
Date: Tue, 29 Mar 2011 10:59:22 +0100
Message-ID: <BANLkTiko_zD_AKheGTRd50GOaSEyYSi0=g@mail.gmail.com>
Subject: Re: should I import the XML file into a mysql dataset ?
To: java-user@lucene.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

> 1 - I'm using commons Digester as xml parser, how can I find the bottlene=
ck
> ? Should I run the code and comment out the Lucene queries part and just
> leave the xml parsing ?

That is what I was suggesting.

> 2 - I actually also wanted to know the following: how much does it take t=
o
> run a 100MB queries text file against each single document of a 100MB
> collection ? On a Intel Dual Duo Core with 4GB Ram ? Are we talking about
> few hours ? Can I have an estimate ?

How many queries are there in the file?
How many documents are there in the lucene index?
How big is the lucene index?
How long does a typical single query take?

What do you mean by "run ... against each single document"?


--
Ian.


> On 29 March 2011 11:43, Ian Lea <ian.lea@gmail.com> wrote:
>
>> You need to figure out what is taking the time, for example by reading
>> the XML file without making any lucene queries. =A0What XML parsing
>> process are you using? =A0Some are faster than others. =A0A google searc=
h
>> should find loads of info.
>>
>> If it turns out that it is lucene searching taking most of the time,
>> see http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
>>
>>
>> But do the figuring out first - there is little point in speeding up
>> the bit that is already quick.
>>
>>
>> --
>> Ian.
>>
>>
>> On Tue, Mar 29, 2011 at 10:22 AM, Patrick Diviacco
>> <patrick.diviacco@gmail.com> wrote:
>> > hi,
>> >
>> > I performing multiple queries (stored in a 100MB XML file) against a
>> > collection (indexed with lucene, and it was stored before in a 100MB X=
ML
>> > file).
>> >
>> > The process seems pretty long on my machine (more than 2 hours), so I =
was
>> > wondering if importing the 100MB queries XML file into a mysql dataset
>> and
>> > extract them with Java would dramatically improve the performances
>> (rather
>> > than working with Java + a xml text file).
>> >
>> > thanks
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org