Return-Path: Delivered-To: apmail-lucene-solr-dev-archive@minotaur.apache.org Received: (qmail 42688 invoked from network); 25 Sep 2009 05:51:41 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 25 Sep 2009 05:51:41 -0000 Received: (qmail 5554 invoked by uid 500); 25 Sep 2009 05:51:41 -0000 Delivered-To: apmail-lucene-solr-dev-archive@lucene.apache.org Received: (qmail 5471 invoked by uid 500); 25 Sep 2009 05:51:40 -0000 Mailing-List: contact solr-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-dev@lucene.apache.org Delivered-To: mailing list solr-dev@lucene.apache.org Received: (qmail 5461 invoked by uid 99); 25 Sep 2009 05:51:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Sep 2009 05:51:40 +0000 X-ASF-Spam-Status: No, hits=-1998.8 required=10.0 tests=ALL_TRUSTED,FS_REPLICA X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Sep 2009 05:51:37 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 06ADE234C48C for ; Thu, 24 Sep 2009 22:51:16 -0700 (PDT) Message-ID: <608936501.1253857876026.JavaMail.jira@brutus> Date: Thu, 24 Sep 2009 22:51:16 -0700 (PDT) From: "Artem Russakovskii (JIRA)" To: solr-dev@lucene.apache.org Subject: [jira] Commented: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly In-Reply-To: <589869744.1253760196259.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759392#action_12759392 ] Artem Russakovskii commented on SOLR-1458: ------------------------------------------ Paul - I'm just glad you guys are so fast to respond and eager to fix. Love OSS :-] > Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly > ------------------------------------------------------------------------------------------ > > Key: SOLR-1458 > URL: https://issues.apache.org/jira/browse/SOLR-1458 > Project: Solr > Issue Type: Bug > Components: replication (java) > Affects Versions: 1.4 > Environment: CentOS x64 > 8GB RAM > Tomcat, running with 7G max memory; memory usage is <2GB, so it's not the problem > Host a: master > Host b: slave > Multiple single core Solr instances, using JNDI. > Java replication > Reporter: Artem Russakovskii > Assignee: Noble Paul > Fix For: 1.4 > > Attachments: SOLR-1458.patch, SOLR-1458.patch, SOLR-1458.patch > > > After finally figuring out the new Java based replication, we have started both the slave and the master and issued optimize to all master Solr instances. This triggered some replication to go through just fine, but it looks like some of it is failing. > Here's what I'm getting in the slave logs, repeatedly for each shard: > {code} > SEVERE: SnapPull failed > java.lang.NullPointerException > at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271) > at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258) > at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > {code} > If I issue an optimize again on the master to one of the shards, it then triggers a replication and replicates OK. I have a feeling that these SnapPull failures appear later on but right now I don't have enough to form a pattern. > Here's replication.properties on one of the failed slave instances. > {code} > cat data/replication.properties > #Replication details > #Wed Sep 23 19:35:30 PDT 2009 > replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 > previousCycleTimeInSeconds=0 > timesFailed=113 > indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016 > indexReplicatedAt=1253759730020 > replicationFailedAt=1253759730020 > lastCycleBytesDownloaded=0 > timesIndexReplicated=113 > {code} > and another > {code} > cat data/replication.properties > #Replication details > #Wed Sep 23 18:42:01 PDT 2009 > replicationFailedAtList=1253756490034,1253756460169 > previousCycleTimeInSeconds=1 > timesFailed=2 > indexReplicatedAtList=1253756521284,1253756490034,1253756460169 > indexReplicatedAt=1253756521284 > replicationFailedAt=1253756490034 > lastCycleBytesDownloaded=22932293 > timesIndexReplicated=3 > {code} > Some relevant configs: > In solrconfig.xml: > {code} > > > > ${enable.master:false} > optimize > optimize > 00:00:20 > > > ${enable.slave:false} > > ${master.url} > > 00:00:30 > > > {code} > The slave then has this in solrcore.properties: > {code} > enable.slave=true > master.url=URLOFMASTER/replication > {code} > and the master has > {code} > enable.master=true > {code} > I'd be glad to provide more details but I'm not sure what else I can do. SOLR-926 may be relevant. > Thanks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.