lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Scofield (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-4519) corrupt tlog causes fullCopy download index files every time reboot a node
Date Fri, 01 Mar 2013 07:43:13 GMT

     [ https://issues.apache.org/jira/browse/SOLR-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Simon Scofield updated SOLR-4519:
---------------------------------

    Description: 
There are two questions:
1. The tlog of one replica of shard1 is damaged by some reason. We are still looking for the
reason. Please give some clue if you are familia with this problem.

2. The error replica successed to recovery by fullcopy download index files from leader. Then
I killed the instance and started it again, the recovery process still is fullcopy download.
In my opinion, after the first time fullcopy recovery, the tlog should be fixed. Here is some
log: 

2013-02-28 15:04:58,622 INFO org.apache.solr.cloud.ZkController:757 - Core needs to recover:metadata
2013-02-28 15:04:58,622 INFO org.apache.solr.update.DefaultSolrCoreState:214 - Running recovery
- first canceling any ongoing recovery
2013-02-28 15:04:58,625 INFO org.apache.solr.cloud.RecoveryStrategy:217 - Starting recovery
process.  core=metadata recoveringAfterStartup=true
2013-02-28 15:04:58,626 INFO org.apache.solr.common.cloud.ZkStateReader:295 - Updating cloud
state from ZooKeeper...
2013-02-28 15:04:58,628 ERROR org.apache.solr.update.UpdateLog:957 - Exception reading versions
from log
java.io.EOFException
        at org.apache.solr.common.util.FastInputStream.readUnsignedByte(FastInputStream.java:72)
        at org.apache.solr.common.util.FastInputStream.readInt(FastInputStream.java:206)
        at org.apache.solr.update.TransactionLog$ReverseReader.next(TransactionLog.java:705)
        at org.apache.solr.update.UpdateLog$RecentUpdates.update(UpdateLog.java:906)
        at org.apache.solr.update.UpdateLog$RecentUpdates.access$000(UpdateLog.java:846)
        at org.apache.solr.update.UpdateLog.getRecentUpdates(UpdateLog.java:996)
        at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:256)
        at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)

2013-02-28 15:05:01,857 INFO org.apache.solr.cloud.RecoveryStrategy:399 - Begin buffering
updates. core=metadata
2013-02-28 15:05:01,857 INFO org.apache.solr.update.UpdateLog:1015 - Starting to buffer updates.
FSUpdateLog{state=ACTIVE, tlog=null}
2013-02-28 15:05:01,857 INFO org.apache.solr.cloud.RecoveryStrategy:126 - Attempting to replicate
from http://23.61.21.121:65201/solr/metadata/. core=metadata

2013-02-28 15:05:02,882 INFO org.apache.solr.handler.SnapPuller:305 - Master's generation:
6993
2013-02-28 15:05:02,882 INFO org.apache.solr.handler.SnapPuller:306 - Slave's generation:
6993
2013-02-28 15:05:02,882 INFO org.apache.solr.handler.SnapPuller:307 - Starting replication
process
2013-02-28 15:05:02,893 INFO org.apache.solr.handler.SnapPuller:312 - Number of files in latest
index in master: 422
2013-02-28 15:05:02,897 INFO org.apache.solr.handler.SnapPuller:325 - Starting download to
/solr/nodes/node1/bin/../solr/metadata/data/index.20130228150502893 fullCopy=true

2013-02-28 15:33:55,848 INFO org.apache.solr.handler.SnapPuller:334 - Total time taken for
download : 1732 secs (The size of index files is 94G)

  was:
There are two questions:
1. The tlog of one replica of shard1 is damaged by some reason. We are still looking for the
reason. Please give some clue if you are familia with this problem.

2. The error replica successed to recovery by fullcopy download index files from leader. Then
I killed the instance and started it again, the recovery process still is fullcopy download.
In my opinion, after the first time fullcopy recovery, the tlog should be fixed. Here is some
log: 

    
> corrupt tlog causes fullCopy download index files every time reboot a node
> --------------------------------------------------------------------------
>
>                 Key: SOLR-4519
>                 URL: https://issues.apache.org/jira/browse/SOLR-4519
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.0
>         Environment: The solrcloud is implemented on three servers. There are three solr
instance on each server. The collection has three shards. Every shard has three replica. Replicas
in same shard run in solr instance on different server.
>            Reporter: Simon Scofield
>
> There are two questions:
> 1. The tlog of one replica of shard1 is damaged by some reason. We are still looking
for the reason. Please give some clue if you are familia with this problem.
> 2. The error replica successed to recovery by fullcopy download index files from leader.
Then I killed the instance and started it again, the recovery process still is fullcopy download.
In my opinion, after the first time fullcopy recovery, the tlog should be fixed. Here is some
log: 
> 2013-02-28 15:04:58,622 INFO org.apache.solr.cloud.ZkController:757 - Core needs to recover:metadata
> 2013-02-28 15:04:58,622 INFO org.apache.solr.update.DefaultSolrCoreState:214 - Running
recovery - first canceling any ongoing recovery
> 2013-02-28 15:04:58,625 INFO org.apache.solr.cloud.RecoveryStrategy:217 - Starting recovery
process.  core=metadata recoveringAfterStartup=true
> 2013-02-28 15:04:58,626 INFO org.apache.solr.common.cloud.ZkStateReader:295 - Updating
cloud state from ZooKeeper...
> 2013-02-28 15:04:58,628 ERROR org.apache.solr.update.UpdateLog:957 - Exception reading
versions from log
> java.io.EOFException
>         at org.apache.solr.common.util.FastInputStream.readUnsignedByte(FastInputStream.java:72)
>         at org.apache.solr.common.util.FastInputStream.readInt(FastInputStream.java:206)
>         at org.apache.solr.update.TransactionLog$ReverseReader.next(TransactionLog.java:705)
>         at org.apache.solr.update.UpdateLog$RecentUpdates.update(UpdateLog.java:906)
>         at org.apache.solr.update.UpdateLog$RecentUpdates.access$000(UpdateLog.java:846)
>         at org.apache.solr.update.UpdateLog.getRecentUpdates(UpdateLog.java:996)
>         at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:256)
>         at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)
> 2013-02-28 15:05:01,857 INFO org.apache.solr.cloud.RecoveryStrategy:399 - Begin buffering
updates. core=metadata
> 2013-02-28 15:05:01,857 INFO org.apache.solr.update.UpdateLog:1015 - Starting to buffer
updates. FSUpdateLog{state=ACTIVE, tlog=null}
> 2013-02-28 15:05:01,857 INFO org.apache.solr.cloud.RecoveryStrategy:126 - Attempting
to replicate from http://23.61.21.121:65201/solr/metadata/. core=metadata
> 2013-02-28 15:05:02,882 INFO org.apache.solr.handler.SnapPuller:305 - Master's generation:
6993
> 2013-02-28 15:05:02,882 INFO org.apache.solr.handler.SnapPuller:306 - Slave's generation:
6993
> 2013-02-28 15:05:02,882 INFO org.apache.solr.handler.SnapPuller:307 - Starting replication
process
> 2013-02-28 15:05:02,893 INFO org.apache.solr.handler.SnapPuller:312 - Number of files
in latest index in master: 422
> 2013-02-28 15:05:02,897 INFO org.apache.solr.handler.SnapPuller:325 - Starting download
to /solr/nodes/node1/bin/../solr/metadata/data/index.20130228150502893 fullCopy=true
> 2013-02-28 15:33:55,848 INFO org.apache.solr.handler.SnapPuller:334 - Total time taken
for download : 1732 secs (The size of index files is 94G)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message