lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: SolrCloud commit process is too time consuming, even if documents are light
Date Thu, 25 Jul 2013 14:17:20 GMT
I'm looking into some possible slow down after long indexing issues when I get back from vacation.
This could be related. Very early guess though.

Another thing you might try - Lucene recently changed the merge scheduler policy defaults
(in 4.1) - it used to use up 3 threads to merge and have a max merge setting of that + 2.
It now defaults to 1 and 2, and that can really impact how fast documents are added by a significant
amount. It also causes indexing threads to pause and wait for merges *way* more, especially
when your index gets large and the merges start taking a long time. The tradeoff was supposedly
that merges are faster, but honestly, I think it's a poor default, especially if you are measuring
indexing speed and now really paying attention to how long merges go on afar you finish indexing,
and especially if you have beefy hardware. You might play with those settings.

- Mark

On Jul 25, 2013, at 8:36 AM, Radu Ghita <radu@wmds.ro> wrote:

> Forgot to attach server and solr configurations:
> 
> SolrCloud 4.1, internal Zookeeper, 16 shards, custom java importer.
> Server: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 32 cores, 192gb RAM, 10tb
> SSD and 50tb SAS memory
> 
> 
> On Thu, Jul 25, 2013 at 3:20 PM, Radu Ghita <radu@wmds.ro> wrote:
> 
>> 
>> Hi,
>> 
>> We are having a client with business model that requires indexing each
>> month billion rows into solr from mysql in a small time-frame. The
>> documents are very light, but the number is very high and we need to
>> achieve speeds of around 80-100k/s. The built in solr indexer goes to
>> 40-50k tops, but after some hours ( ~12h ) it crashes and the speed slows
>> down as hours go by.
>> 
>> Therefore we have developed a custom java importer that connects directly
>> to mysql and solrcloud via zookeeper, grabs data from mysql, creates
>> documents and then imports into solr. This helps because we are opening ~50
>> threads and the indexing process speeds up. We have optimized the mysql
>> queries ( mysql was the initial bottleneck ) and the speeds we get now are
>> over 100k/s, but as index number gets bigger, solr stays very long on
>> adding documents. I assume it needs to be something from solrconfig that
>> makes solr stay and even block after 100 mil documents indexed.
>> 
>> Here is the java code that creates documents and then adds to solr server:
>> 
>> public void createDocuments() throws SQLException, SolrServerException,
>> IOException
>> {
>> App.logger.write("Creating documents..");
>> this.docs = new ArrayList<SolrInputDocument>();
>> App.logger.incrementNumberOfRows(this.size);
>> while(this.results.next())
>> { this.docs.add(this.getDocumentFromResultSet(this.results)); }
>> 
>> this.statement.close();
>> this.results.close();
>> }
>> 
>> public void commitDocuments() throws SolrServerException, IOException
>> { App.logger.write("Committing.."); App.solrServer.add(this.docs); // here
>> it stays very long and then blocks
>> App.logger.incrementNumberOfRows(this.docs.size()); this.docs.clear(); }
>> 
>> I am also pasting solrconfig.xml parameters that make sense to this
>> discussion:
>> <maxIndexingThreads>128</maxIndexingThreads>
>> <useCompoundFile>false</useCompoundFile>
>> <ramBufferSizeMB>10000</ramBufferSizeMB>
>> <maxBufferedDocs>1000000</maxBufferedDocs>
>> <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
>> <int name="maxMergeAtOnce">20000</int>
>> <int name="segmentsPerTier">1000000</int>
>> <int name="maxMergeAtOnceExplicit">10000</int>
>> </mergePolicy>
>> <mergeFactor>100</mergeFactor>
>> <termIndexInterval>1024</termIndexInterval>
>> <autoCommit>
>> <maxTime>15000</maxTime>
>> <maxDocs>1000000</maxDocs>
>> <openSearcher>false</openSearcher>
>> </autoCommit>
>> <autoSoftCommit>
>> <maxTime>2000000</maxTime>
>> </autoSoftCommit>
>> 
>> The big problem stands in SOLR, because I've run the mysql queries single
>> and speed is great, but as the time passes solr adding function stays way
>> too long and then it blocks, even tho server is top level and has lots of
>> resources.
>> 
>> I'm new to this so please assist. Thanks,
>> --
>> 
>> **
>> 
>>  *Radu Ghita *--------------------------------
>> 
>>  Tel:   +40 721 18 18 68
>> 
>>  Fax:  +40 351 81 85 52
>> 
> 
> 
> 
> -- 
> 
> **
> 
>  *Radu Ghita *--------------------------------
> 
>  Tel:   +40 721 18 18 68
> 
>  Fax:  +40 351 81 85 52


Mime
View raw message