Return-Path: Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: (qmail 69778 invoked from network); 20 May 2010 15:45:49 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 20 May 2010 15:45:49 -0000 Received: (qmail 3850 invoked by uid 500); 20 May 2010 15:45:46 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 3758 invoked by uid 500); 20 May 2010 15:45:46 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 3750 invoked by uid 99); 20 May 2010 15:45:46 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 May 2010 15:45:46 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.191.84.221] (HELO web82108.mail.mud.yahoo.com) (209.191.84.221) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 20 May 2010 15:45:40 +0000 Received: (qmail 19972 invoked by uid 60001); 20 May 2010 15:45:19 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sbcglobal.net; s=s1024; t=1274370319; bh=BtdRSgomHTgcMFZzUhmi8cvxLxy+LkZL6VlgWJ5JF+4=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=l/tRv8AUA9a/ajakSnGhPSC1oNKE0oNo7cK8WOgnfZQRu23I9rKua+gpOcBk+zCgTCJUkulA1rY7pPAG106axXDvK0xr7r79Kdm93zur8H0/0GYUPyn8AxKTv5wdIUBRbjrp7V2GPZvOPb+WZtHMhvvB3axKY5MWpqn3HwlDuW0= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=sbcglobal.net; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=3Qx/JJPg+OAXj44QOM29DzRe24hpThO7/l8ami7xTZNFGXLN4FxV7mrYkx0rVBifCNyFmc/rOf9CCPNqeRkMnP2aMrA/FD9O+dDuE0+mt58g4gPsiHC/Xx2xNdZfrtCH6CKX7ApDl6gDb4IOPNjORldls2zZuHngaaRvNzk6CUk=; Message-ID: <136877.17250.qm@web82108.mail.mud.yahoo.com> X-YMail-OSG: aoqytUwVM1lJ4e698RfLYnYs2kSo1RJ5oBumOECqosCBcO3 H09JJ3R21fRFbKBqKLzVvmZn_ts_EP2TzXGfnkh0h.LJG..d4qSdVBpjDMM_ z_13MVmwnqvrwTvl1Fg1xJiut2xnwc9HcnooXwSujtMyKmCF8AAeK14fjynB IG8xOWeZ0dtNA2sqxaKS7NV6Vev3Jg9WzBKxg_KyflCQIzwzmisFfr.mFnYV gKQ9t0amP8UpIjoN.PATz8Jch6rUhCLq66sBhnAbx0T90aJgHy4XPGoZwVRo OrvvHntUxEpfu5XDc8xI6ZRfmsJE3CsfOBN1ejGqM8NsDixdjmltkJYppWNv NVDiREtVZkvBXX9ZO7.9bd7dovg-- Received: from [68.183.233.178] by web82108.mail.mud.yahoo.com via HTTP; Thu, 20 May 2010 08:45:18 PDT X-Mailer: YahooMailClassic/11.0.8 YahooMailWebService/0.8.103.269680 Date: Thu, 20 May 2010 08:45:18 -0700 (PDT) From: Dennis Gearon Subject: RE: Machine utilization while indexing To: solr-user@lucene.apache.org In-Reply-To: <13D828CF2C5A6D4597111B6E6571FD63090063C75C@GMEXMBS2.globeandmail.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Here is a good article from IBM, with code, on how to do hybrid/cloud compu= ting.=0A=0Ahttp://www.ibm.com/developerworks/library/x-cloudpt1/=0A=0A=0ADe= nnis Gearon=0A=0ASignature Warning=0A----------------=0AEARTH has a Right T= o Life,=0A otherwise we all die.=0A=0ARead 'Hot, Flat, and Crowded'=0ALaug= h at http://www.yert.com/film.php=0A=0A=0A--- On Thu, 5/20/10, Nagelberg, K= allin wrote:=0A=0A> From: Nagelberg, Kallin <= KNagelberg@globeandmail.com>=0A> Subject: RE: Machine utilization while ind= exing=0A> To: "'solr-user@lucene.apache.org'" = =0A> Date: Thursday, May 20, 2010, 8:16 AM=0A> How about throwing a blockin= gqueue,=0A> http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/Bl= ockingQueue.html,=0A> between your document-creator and solrserver? Give it= a size=0A> of 10,000 or something, with one thread trying to feed it,=0A> = and one thread waiting for it to get near full then draining=0A> it. Take t= he drained results and add them to the server=0A> (maybe try not using stre= amingsolrserver). Something like=0A> that worked well for me with about 5,0= 00,000 documents each=0A> ~5k taking about 8 hours.=0A> =0A> -Kallin Nagelb= erg=0A> =0A> -----Original Message-----=0A> From: Thijs [mailto:vonk.thijs@= gmail.com]=0A> =0A> Sent: Thursday, May 20, 2010 11:02 AM=0A> To: solr-user= @lucene.apache.org=0A> Subject: Machine utilization while indexing=0A> =0A>= Hi.=0A> =0A> I have a question about how I can get solr to index quicker= =0A> then it does =0A> at the moment.=0A> =0A> I have to index (and re-inde= x) some 3-5 million documents.=0A> These =0A> documents are preprocessed by= a java application that=0A> effectively =0A> combines multiple database ta= bles with each-other to form=0A> the =0A> SolrInputDocument.=0A> =0A> What = I'm seeing however is that the queue of documents that=0A> are ready to =0A= > be send to the solr server exceeds my preset limit. Telling=0A> me that S= olr =0A> somehow can't process the documents fast enough.=0A> =0A> (I have = created my own queue in front of=0A> Solrj.StreamingUpdateSolrServer =0A> a= s it would not process the documents fast enough causing =0A> OutOfMemoryEx= ceptions due to the large amount of documents=0A> building up =0A> in it's = queue)=0A> =0A> I have an index that for 95% consist of ID's (Long). We=0A>= don't do any =0A> analysis on the fields that are being indexed. The schem= a=0A> is rather =0A> straight forward.=0A> =0A> most fields look like=0A> <= fieldType name=3D"long" class=3D"solr.LongField"=0A> omitNorms=3D"true"/>= =0A> indexed=3D"= true" =0A> required=3D"true" />=0A> indexed=3D"true" =0A> multiValued=3D"true"/>=0A> =0A> th= e relevant solrconfig.xml=0A> =0A> =A0=0A> =A0=A0=A0false=0A> =A0=0A> =A0=A0=A0100=0A> =A0=0A> =A0=A0=A0256=0A> = =A0=0A> =A0=A0=A02147483647=0A> =A0=0A> =A0=A0= =A010000=0A> =A0=0A> =A0=A0=A01000=0A> =A0=0A> =A0=A0=A010000= =0A> =A0=0A> =A0=A0=A0single=0A> <= /indexDefaults>=0A> =0A> =0A> The machines I'm testing on have a:=0A> Intel= (R) Core(TM)2 Quad CPU=A0 =A0 Q9550=A0 @=0A> 2.83GHz=0A> With 4GB of ram.= =0A> Running on linux java version 1.6.0_17, tomcat 6 and solr=0A> version = 1.4=0A> =0A> What I'm seeing is that the network almost never reaches=0A> m= ore then 10% =0A> of the 1GB/s connection.=0A> That the CPU utilization is = always below 25% (1 core is=0A> used, not the =0A> others)=0A> I don't see = heavy disk-io.=0A> Also while indexing the memory consumption is:=0A> Free = memory: 212.15 MB Total memory: 509.12 MB Max memory:=0A> 2730.68 MB=0A> = =0A> And that in the beginning (with a empty index) I get 2ms=0A> per inser= t but =0A> this slows to 18-19ms per insert.=0A> =0A> Are there any tips/tr= icks I can use to speed up my=0A> indexing? Because I =0A> have a feeling t= hat my machine is capable of doing more=0A> (use more =0A> cpu's). I just c= an't figure-out how.=0A> =0A> Thijs=0A>