Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E5AD11A61 for ; Wed, 27 Apr 2011 00:02:03 +0000 (UTC) Received: (qmail 82316 invoked by uid 500); 27 Apr 2011 00:01:59 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 82231 invoked by uid 500); 27 Apr 2011 00:01:59 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 82223 invoked by uid 99); 27 Apr 2011 00:01:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Apr 2011 00:01:59 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.212.48] (HELO mail-vw0-f48.google.com) (209.85.212.48) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Apr 2011 00:01:52 +0000 Received: by vws7 with SMTP id 7so1378992vws.35 for ; Tue, 26 Apr 2011 17:01:31 -0700 (PDT) Received: by 10.52.66.136 with SMTP id f8mr2048442vdt.122.1303862490841; Tue, 26 Apr 2011 17:01:30 -0700 (PDT) Received: from Hydra.local (pool-108-5-120-77.nwrknj.fios.verizon.net [108.5.120.77]) by mx.google.com with ESMTPS id p7sm69731vdw.47.2011.04.26.17.01.29 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 26 Apr 2011 17:01:30 -0700 (PDT) Date: Tue, 26 Apr 2011 20:01:28 -0400 From: Charles Wardell To: solr-user@lucene.apache.org Message-ID: <98F8535AABE147D9A0BA1DFB143EDE31@bcsolution.com> In-Reply-To: <757199.45080.qm@web130101.mail.mud.yahoo.com> References: <3CE71CB6-F237-44AA-9FE2-F04C78A6EDCA@bcsolution.com> <757199.45080.qm@web130101.mail.mud.yahoo.com> Subject: Re: Question on Batch process X-Mailer: sparrow 1.1.2 (build 688.7) MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="4db75cd8_77465f01_8d93" Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org --4db75cd8_77465f01_8d93 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Content-Disposition: inline Thank you Otis. Without trying to appear to stupid, when you refer to having the params matching your # of CPU cores, you are talking about the # of threads I can spawn with the StreamingUpdateSolrServer object? Up until now, I have been just utilizing post.sh or post.jar. Are these capable of that or do I need to write some code to collect a bunch of files into the buffer and send it off? Also, Do you have a sense for how long it should take to index 100,000 files or in my case 100,000,000 documents? StreamingUpdateSolrServer public StreamingUpdateSolrServer(String solrServerUrl, int queueSize, int threadCount) throws MalformedURLException Thanks again, Charlie -- Best Regards, Charles Wardell Blue Chips Technology, Inc. www.bcsolution.com On Tuesday, April 26, 2011 at 5:12 PM, Otis Gospodnetic wrote: > Charlie, > > How's this: > * -Xmx2g > * ramBufferSizeMB 512 > * mergeFactor 10 (default, but you could up it to 20, 30, if ulimit -n allows) > * ignore/delete maxBufferedDocs - not used if you ran ramBufferSizeMB > * use SolrStreamingUpdateServer (with params matching your number of CPU cores) > or send batches of say 1000 docs with the other SolrServer impl using N threads > (N=# of your CPU cores) > > Otis > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > ----- Original Message ---- > > From: Charles Wardell > > To: solr-user@lucene.apache.org > > Sent: Tue, April 26, 2011 2:32:29 PM > > Subject: Question on Batch process > > > > I am sure that this question has been asked a few times, but I can't seem to > > find the sweetspot for indexing. > > > > I have about 100,000 files each containing 1,000 xml documents ready to be > > posted to Solr. My desire is to have it index as quickly as possible and then > > once completed the daily stream of ADDs will be small in comparison. > > > > The individual documents are small. Essentially web postings from the net. > > Title, postPostContent, date. > > > > > > What would be the ideal configuration? For RamBufferSize, mergeFactor, > > MaxbufferedDocs, etc.. > > > > My machine is a quad core hyper-threaded. So it shows up as 8 cpu's in TOP > > I have 16GB of available ram. > > > > > > Thanks in advance. > > Charlie > --4db75cd8_77465f01_8d93--