lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <s...@elyograg.org>
Subject Re: solr indexing not working when i try to insert 1000000 rows but works fine when i try to index 400000 rows or below
Date Tue, 03 Jun 2014 13:10:23 GMT
On 6/3/2014 12:00 AM, madhav bahuguna wrote:
> iam using solr 4.7.1 and trying to do a full import.My data source is a
> table in mysql. It has 10000000 rows and 20 columns.
> 
> Whenever iam trying to do a full import solr stops responding. But when i
> try to do a import with a limit of 400000 or less it works fine.

<snip>

> Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:
> Communications link failure
> 
> The last packet successfully received from the server was 130,037
> milliseconds ago. The last packet sent successfully to the server was
> 130,038 milliseconds ago. at

If you aren't already using batchSize=-1 as Ahmet mentioned, you should
be ... but I suspect that your actual problem is different, and has to
do with multiple simultaneous merge tiers.  When indexing continues long
enough, you end up with multiple simultaneous merges scheduled.  As long
as there are more than the configured limit of merges (default 2) on the
schedule, all indexing will stop.  If this continues for long enough,
JDBC (or MySQL itself) will close the database connection because of
inactivity.

You need to increase indexConfig/mergeScheduler/maxMergeCount in
solrconfig.xml.  Here's my indexConfig settings.  The important part for
the problem I have described is the mergeScheduler config:

<indexConfig>
  <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
    <int name="maxMergeAtOnce">35</int>
    <int name="segmentsPerTier">35</int>
    <int name="maxMergeAtOnceExplicit">105</int>
  </mergePolicy>
  <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
    <int name="maxThreadCount">1</int>
    <int name="maxMergeCount">6</int>
  </mergeScheduler>
  <ramBufferSizeMB>48</ramBufferSizeMB>
  <infoStream file="INFOSTREAM-${solr.core.name}.txt">false</infoStream>
</indexConfig>

I filed this issue to deal with this problem before people run into it:

https://issues.apache.org/jira/browse/LUCENE-5705

I am still working out the best way to tackle the problem with other
committers.

Thanks,
Shawn


Mime
View raw message