lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charles Wardell <charles.ward...@bcsolution.com>
Subject Re: Question on Batch process
Date Wed, 27 Apr 2011 00:01:28 GMT
Thank you Otis.
Without trying to appear to stupid, when you refer to having the params matching your # of
CPU cores, you are talking about the # of threads I can spawn with the StreamingUpdateSolrServer
object?
Up until now, I have been just utilizing post.sh or post.jar. Are these capable of that or
do I need to write some code to collect a bunch of files into the buffer and send it off?

Also, Do you have a sense for how long it should take to index 100,000 files or in my case
100,000,000 documents?
StreamingUpdateSolrServer
public StreamingUpdateSolrServer(String solrServerUrl, int queueSize, int threadCount) throws
MalformedURLException

Thanks again,
Charlie

-- 
Best Regards,

Charles Wardell
Blue Chips Technology, Inc.
www.bcsolution.com

On Tuesday, April 26, 2011 at 5:12 PM, Otis Gospodnetic wrote: 
> Charlie,
> 
> How's this:
> * -Xmx2g
> * ramBufferSizeMB 512
> * mergeFactor 10 (default, but you could up it to 20, 30, if ulimit -n allows)
> * ignore/delete maxBufferedDocs - not used if you ran ramBufferSizeMB
> * use SolrStreamingUpdateServer (with params matching your number of CPU cores) 
> or send batches of say 1000 docs with the other SolrServer impl using N threads 
> (N=# of your CPU cores)
> 
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
> 
> ----- Original Message ----
> > From: Charles Wardell <charles.wardell@bcsolution.com>
> > To: solr-user@lucene.apache.org
> > Sent: Tue, April 26, 2011 2:32:29 PM
> > Subject: Question on Batch process
> > 
> > I am sure that this question has been asked a few times, but I can't seem to 
> > find the sweetspot for indexing.
> > 
> > I have about 100,000 files each containing 1,000 xml documents ready to be 
> > posted to Solr. My desire is to have it index as quickly as possible and then 
> > once completed the daily stream of ADDs will be small in comparison.
> > 
> > The individual documents are small. Essentially web postings from the net. 
> > Title, postPostContent, date. 
> > 
> > 
> > What would be the ideal configuration? For RamBufferSize, mergeFactor, 
> > MaxbufferedDocs, etc..
> > 
> > My machine is a quad core hyper-threaded. So it shows up as 8 cpu's in TOP
> > I have 16GB of available ram.
> > 
> > 
> > Thanks in advance.
> > Charlie
> 

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message