Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 90150 invoked from network); 29 Mar 2011 10:00:12 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 29 Mar 2011 10:00:12 -0000 Received: (qmail 23303 invoked by uid 500); 29 Mar 2011 10:00:09 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 23264 invoked by uid 500); 29 Mar 2011 10:00:09 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 23256 invoked by uid 99); 29 Mar 2011 10:00:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Mar 2011 10:00:09 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ian.lea@gmail.com designates 209.85.214.176 as permitted sender) Received: from [209.85.214.176] (HELO mail-iw0-f176.google.com) (209.85.214.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Mar 2011 10:00:02 +0000 Received: by iwr19 with SMTP id 19so6478086iwr.35 for ; Tue, 29 Mar 2011 02:59:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type:content-transfer-encoding; bh=Pel3VRUsWRnH4BOTv5ehnwF8gxm3mUwe3qNVF6hYYUw=; b=TSaK913JTLWZ30n/7I2jAjlydrS4n/bPshjuAqQYzSeVWIog9rqeZDvmjK3iP7l/uv 9WGB/ynBASF8GQdV2nG4SMzS+Ty906RcEmzs3JyJSkqfpE1Bn0wzdcyFe7HLSPlYUDwZ nh+O6qYFMPo1EXdIRX/m5CHlbkmn5PUPxB+MU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=Uz1iUYOhE9/Xw6kkEiHg+UwPBndhJPIrndX52qn/SAa9d4Pf0QU7Xo8dS/iZEGByzG FRYxVtVgmlXGvINKUPQZPxmuaFJ0WBzHxJXRdMDAc07OeyQdfB3x0UQX/nAjmtDFnx2o /E2TlhSbMCFrEwmtbnkO8Opu2VK5VU3BzXlCk= Received: by 10.231.180.19 with SMTP id bs19mr5368734ibb.146.1301392782126; Tue, 29 Mar 2011 02:59:42 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.199.134 with HTTP; Tue, 29 Mar 2011 02:59:22 -0700 (PDT) In-Reply-To: References: From: Ian Lea Date: Tue, 29 Mar 2011 10:59:22 +0100 Message-ID: Subject: Re: should I import the XML file into a mysql dataset ? To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable > 1 - I'm using commons Digester as xml parser, how can I find the bottlene= ck > ? Should I run the code and comment out the Lucene queries part and just > leave the xml parsing ? That is what I was suggesting. > 2 - I actually also wanted to know the following: how much does it take t= o > run a 100MB queries text file against each single document of a 100MB > collection ? On a Intel Dual Duo Core with 4GB Ram ? Are we talking about > few hours ? Can I have an estimate ? How many queries are there in the file? How many documents are there in the lucene index? How big is the lucene index? How long does a typical single query take? What do you mean by "run ... against each single document"? -- Ian. > On 29 March 2011 11:43, Ian Lea wrote: > >> You need to figure out what is taking the time, for example by reading >> the XML file without making any lucene queries. =A0What XML parsing >> process are you using? =A0Some are faster than others. =A0A google searc= h >> should find loads of info. >> >> If it turns out that it is lucene searching taking most of the time, >> see http://wiki.apache.org/lucene-java/ImproveSearchingSpeed >> >> >> But do the figuring out first - there is little point in speeding up >> the bit that is already quick. >> >> >> -- >> Ian. >> >> >> On Tue, Mar 29, 2011 at 10:22 AM, Patrick Diviacco >> wrote: >> > hi, >> > >> > I performing multiple queries (stored in a 100MB XML file) against a >> > collection (indexed with lucene, and it was stored before in a 100MB X= ML >> > file). >> > >> > The process seems pretty long on my machine (more than 2 hours), so I = was >> > wondering if importing the 100MB queries XML file into a mysql dataset >> and >> > extract them with Java would dramatically improve the performances >> (rather >> > than working with Java + a xml text file). >> > >> > thanks >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org