lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Artem Russakovskii (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (SOLR-1458) Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
Date Thu, 24 Sep 2009 21:04:16 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759247#action_12759247
] 

Artem Russakovskii edited comment on SOLR-1458 at 9/24/09 2:03 PM:
-------------------------------------------------------------------

Re: hit the master with the filelist command.

First of all, this may have been a really late night for the person who wrote the wiki:
"Get list of lucene files present in the index: http://host:port/solr/replication?command=filelist&indexversion=<index-version-number>
. The version number can be obtained using the indexversion calmmand"
The last word there ;-]

Now, I hit the master with the following: MASTER/replication/?command=indexversion, get back
{code}
<response> 
<lst name="responseHeader"><int name="status">0</int><int name="QTime">0</int></lst><long
name="indexversion">1253136035158</long><long name="generation">4447</long>

</response> 
{code}

Then I use this in the following query: MASTER/replication/?command=filelist&indexversion=1253136035158

but I get back
{code}
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">0</int></lst><str
name="status">invalid indexversion</str>
</response>
{code}

I tried the same against an another instance that doesn't have the NullPointerException replication
problem and still get the same error, then I tried another one, which had no documents indexed
into yet, and it returned
{code}
<response> 
<lst name="responseHeader"><int name="status">0</int><int name="QTime">0</int></lst><arr
name="filelist"><lst><str name="name">segments_1</str><long name="lastmodified">1253136032000</long><long
name="size">32</long></lst></arr> 
</response> 
{code}

Here's what I see in the logs.

On the master:
{code}
Sep 24, 2009 1:56:30 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/events_2009_08 path=/replication params={command=indexversion&wt=javabin}
status=0 QTime=0 
Sep 24, 2009 1:56:30 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/events_2009_04 path=/replication params={command=indexversion&wt=javabin}
status=0 QTime=0 
Sep 24, 2009 1:56:30 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/events_2009_10 path=/replication params={command=indexversion&wt=javabin}
status=0 QTime=0 
Sep 24, 2009 1:56:30 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/events_2009_02 path=/replication params={command=indexversion&wt=javabin}
status=0 QTime=0 
Sep 24, 2009 1:56:30 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/events_2009_12 path=/replication params={command=indexversion&wt=javabin}
status=0 QTime=0 
Sep 24, 2009 1:56:30 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/events_2009_06 path=/replication params={command=indexversion&wt=javabin}
status=0 QTime=0 
Sep 24, 2009 1:56:30 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/events_2009_08 path=/replication params={indexversion=1253136074767&command=filelist&wt=javabin}
status=0 QTime=0 
Sep 24, 2009 1:56:30 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/events_2009_06 path=/replication params={indexversion=1253136077032&command=filelist&wt=javabin}
status=0 QTime=0 
{code}

On the slave:
{code}
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Master's version: 1253136074767, generation: 40983
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Master's version: 1253136077032, generation: 42291
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave's version: 1253136076722, generation: 41981
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Starting replication process
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave's version: 1253136074452, generation: 40668
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Starting replication process
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.ReplicationHandler doFetch
SEVERE: SnapPull failed 
java.lang.NullPointerException
        at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
        at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
        at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.ReplicationHandler doFetch
SEVERE: SnapPull failed 
java.lang.NullPointerException
        at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
        at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
        at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
{code}

Suggestions?

      was (Author: archon810):
    Re: hit the master with the filelist command.

First of all, this may have been a really late night for the person who wrote the wiki:
"Get list of lucene files present in the index: http://host:port/solr/replication?command=filelist&indexversion=<index-version-number>
. The version number can be obtained using the indexversion calmmand"
The last word there ;-]

Now, I hit the master with the following: MASTER/replication/?command=indexversion, get back
{code}
<response> 
<lst name="responseHeader"><int name="status">0</int><int name="QTime">0</int></lst><long
name="indexversion">1253136035158</long><long name="generation">4447</long>

</response> 
{code}

Then I use this in the following query: MASTER/replication/?command=filelist&indexversion=1253136035158

but I get back
{code}
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">0</int></lst><str
name="status">invalid indexversion</str>
</response>
{code}

I tried the same against an another instance that doesn't have the NullPointerException replication
problem and still get the same error, then I tried another one, which had no documents indexed
into yet, and it returned
{code}
<response> 
<lst name="responseHeader"><int name="status">0</int><int name="QTime">0</int></lst><arr
name="filelist"><lst><str name="name">segments_1</str><long name="lastmodified">1253136032000</long><long
name="size">32</long></lst></arr> 
</response> 
{code}

The weird part is that even though the master responds with invalid indexversion for almost
every instance I've tried, only 2 of them are failing as far as the slave is concerned:

On the master:
{code}
Sep 24, 2009 1:56:30 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/events_2009_08 path=/replication params={command=indexversion&wt=javabin}
status=0 QTime=0 
Sep 24, 2009 1:56:30 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/events_2009_04 path=/replication params={command=indexversion&wt=javabin}
status=0 QTime=0 
Sep 24, 2009 1:56:30 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/events_2009_10 path=/replication params={command=indexversion&wt=javabin}
status=0 QTime=0 
Sep 24, 2009 1:56:30 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/events_2009_02 path=/replication params={command=indexversion&wt=javabin}
status=0 QTime=0 
Sep 24, 2009 1:56:30 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/events_2009_12 path=/replication params={command=indexversion&wt=javabin}
status=0 QTime=0 
Sep 24, 2009 1:56:30 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/events_2009_06 path=/replication params={command=indexversion&wt=javabin}
status=0 QTime=0 
Sep 24, 2009 1:56:30 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/events_2009_08 path=/replication params={indexversion=1253136074767&command=filelist&wt=javabin}
status=0 QTime=0 
Sep 24, 2009 1:56:30 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/events_2009_06 path=/replication params={indexversion=1253136077032&command=filelist&wt=javabin}
status=0 QTime=0 
{code}

On the slave:
{code}
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Master's version: 1253136074767, generation: 40983
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Master's version: 1253136077032, generation: 42291
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave's version: 1253136076722, generation: 41981
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Starting replication process
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave's version: 1253136074452, generation: 40668
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Starting replication process
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Slave in sync with master.
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.ReplicationHandler doFetch
SEVERE: SnapPull failed 
java.lang.NullPointerException
        at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
        at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
        at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Sep 24, 2009 2:01:00 PM org.apache.solr.handler.ReplicationHandler doFetch
SEVERE: SnapPull failed 
java.lang.NullPointerException
        at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
        at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
        at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
{code}
Suggestions?
  
> Java Replication error: NullPointerException SEVERE: SnapPull failed on 2009-09-22 nightly
> ------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1458
>                 URL: https://issues.apache.org/jira/browse/SOLR-1458
>             Project: Solr
>          Issue Type: Bug
>          Components: replication (java)
>    Affects Versions: 1.4
>         Environment: CentOS x64
> 8GB RAM
> Tomcat, running with 7G max memory; memory usage is <2GB, so it's not the problem
> Host a: master
> Host b: slave
> Multiple single core Solr instances, using JNDI.
> Java replication
>            Reporter: Artem Russakovskii
>            Assignee: Noble Paul
>             Fix For: 1.4
>
>         Attachments: SOLR-1458.patch, SOLR-1458.patch
>
>
> After finally figuring out the new Java based replication, we have started both the slave
and the master and issued optimize to all master Solr instances. This triggered some replication
to go through just fine, but it looks like some of it is failing.
> Here's what I'm getting in the slave logs, repeatedly for each shard:
> {code} 
> SEVERE: SnapPull failed 
> java.lang.NullPointerException
>         at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:271)
>         at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:258)
>         at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> {code} 
> If I issue an optimize again on the master to one of the shards, it then triggers a replication
and replicates OK. I have a feeling that these SnapPull failures appear later on but right
now I don't have enough to form a pattern.
> Here's replication.properties on one of the failed slave instances.
> {code}
> cat data/replication.properties 
> #Replication details
> #Wed Sep 23 19:35:30 PDT 2009
> replicationFailedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
> previousCycleTimeInSeconds=0
> timesFailed=113
> indexReplicatedAtList=1253759730020,1253759700018,1253759670019,1253759640018,1253759610018,1253759580022,1253759550019,1253759520016,1253759490026,1253759460016
> indexReplicatedAt=1253759730020
> replicationFailedAt=1253759730020
> lastCycleBytesDownloaded=0
> timesIndexReplicated=113
> {code}
> and another
> {code}
> cat data/replication.properties 
> #Replication details
> #Wed Sep 23 18:42:01 PDT 2009
> replicationFailedAtList=1253756490034,1253756460169
> previousCycleTimeInSeconds=1
> timesFailed=2
> indexReplicatedAtList=1253756521284,1253756490034,1253756460169
> indexReplicatedAt=1253756521284
> replicationFailedAt=1253756490034
> lastCycleBytesDownloaded=22932293
> timesIndexReplicated=3
> {code}
> Some relevant configs:
> In solrconfig.xml:
> {code}
> <!-- For docs see http://wiki.apache.org/solr/SolrReplication -->
>   <requestHandler name="/replication" class="solr.ReplicationHandler" >
>     <lst name="master">
>         <str name="enable">${enable.master:false}</str>
>         <str name="replicateAfter">optimize</str>
>         <str name="backupAfter">optimize</str>
>         <str name="commitReserveDuration">00:00:20</str>
>     </lst>
>     <lst name="slave">
>         <str name="enable">${enable.slave:false}</str>
>         <!-- url of master, from properties file -->
>         <str name="masterUrl">${master.url}</str>
>         <!-- how often to check master -->
>         <str name="pollInterval">00:00:30</str>
>     </lst>
>   </requestHandler>
> {code}
> The slave then has this in solrcore.properties:
> {code}
> enable.slave=true
> master.url=URLOFMASTER/replication
> {code}
> and the master has
> {code}
> enable.master=true
> {code}
> I'd be glad to provide more details but I'm not sure what else I can do.  SOLR-926 may
be relevant.
> Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message