lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Woodward (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-4531) Add tests to ensure that recovery does not fail on corrupted tlogs
Date Mon, 31 Oct 2016 08:52:59 GMT

    [ https://issues.apache.org/jira/browse/SOLR-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15621647#comment-15621647
] 

Alan Woodward commented on SOLR-4531:
-------------------------------------

I've seen this test fail a couple of times, I think due to collections not being fully active
when the \@Before method is run: see https://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Linux/2078/consoleFull
for example.  If the collection isn't fully active, then the DBQ and commit get buffered,
and the query to check that the collection is empty is served by an old searcher and consequently
fails.

To fix, I'd suggest either adding a waitForState() call in the \@Before method, or just creating
a new collection per-method.

> Add tests to ensure that recovery does not fail on corrupted tlogs
> ------------------------------------------------------------------
>
>                 Key: SOLR-4531
>                 URL: https://issues.apache.org/jira/browse/SOLR-4531
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 4.0
>            Reporter: Simon Scofield
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 6.3, master (7.0)
>
>         Attachments: SOLR-4531.patch, SOLR-4531.patch
>
>
> One of the solr nodes in our SolrCloud was killed. It caused tlog was corrupted. Now
the node can't finish recoverying. There is an excepion:
> Caused by: java.lang.IndexOutOfBoundsException: Index: 14, Size: 13
> 	at java.util.ArrayList.RangeCheck(ArrayList.java:547)
> 	at java.util.ArrayList.get(ArrayList.java:322)
> 	at org.apache.solr.update.TransactionLog$LogCodec.readExternString(TransactionLog.java:128)
> 	at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188)
> 	at org.apache.solr.common.util.JavaBinCodec.readOrderedMap(JavaBinCodec.java:120)
> 	at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:184)
> 	at org.apache.solr.common.util.JavaBinCodec.readArray(JavaBinCodec.java:451)
> 	at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:182)
> 	at org.apache.solr.common.util.JavaBinCodec.readOrderedMap(JavaBinCodec.java:121)
> 	at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:184)
> 	at org.apache.solr.common.util.JavaBinCodec.readArray(JavaBinCodec.java:451)
> 	at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:182)
> 	at org.apache.solr.common.util.JavaBinCodec.readArray(JavaBinCodec.java:451)
> 	at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:182)
> 	at org.apache.solr.update.TransactionLog$ReverseReader.next(TransactionLog.java:708)
> 	at org.apache.solr.update.UpdateLog$RecentUpdates.update(UpdateLog.java:906)
> 	at org.apache.solr.update.UpdateLog$RecentUpdates.access$000(UpdateLog.java:846)
> 	at org.apache.solr.update.UpdateLog.getRecentUpdates(UpdateLog.java:996)
> 	at org.apache.solr.update.UpdateLog.init(UpdateLog.java:241)
> 	at org.apache.solr.update.UpdateHandler.initLog(UpdateHandler.java:94)
> 	at org.apache.solr.update.UpdateHandler.<init>(UpdateHandler.java:123)
> 	at org.apache.solr.update.DirectUpdateHandler2.<init>(DirectUpdateHandler2.java:97)
> 	... 31 more
> I check the code in UpdateLog.java. I find that only IOException is catched when the
above expception happens.
> {code:title=solr\\core\\src\\java\\org\\apache\\solr\\update\\UpdateLog.java|borderStyle=solid}
>     private void update() {
>       int numUpdates = 0;
>       updateList = new ArrayList<List<Update>>(logList.size());
>       deleteByQueryList = new ArrayList<Update>();
>       deleteList = new ArrayList<DeleteUpdate>();
>       updates = new HashMap<Long,Update>(numRecordsToKeep);
>       for (TransactionLog oldLog : logList) {
>         List<Update> updatesForLog = new ArrayList<Update>();
>         TransactionLog.ReverseReader reader = null;
>         try {
>           reader = oldLog.getReverseReader();
>           while (numUpdates < numRecordsToKeep) {
>             Object o = reader.next();
>             if (o==null) break;
>             try {
>               // should currently be a List<Oper,Ver,Doc/Id>
>               List entry = (List)o;
>               // TODO: refactor this out so we get common error handling
>               int opAndFlags = (Integer)entry.get(0);
>               if (latestOperation == 0) {
>                 latestOperation = opAndFlags;
>               }
>               int oper = opAndFlags & UpdateLog.OPERATION_MASK;
>               long version = (Long) entry.get(1);
>               switch (oper) {
>                 case UpdateLog.ADD:
>                 case UpdateLog.DELETE:
>                 case UpdateLog.DELETE_BY_QUERY:
>                   Update update = new Update();
>                   update.log = oldLog;
>                   update.pointer = reader.position();
>                   update.version = version;
>                   updatesForLog.add(update);
>                   updates.put(version, update);
>                   
>                   if (oper == UpdateLog.DELETE_BY_QUERY) {
>                     deleteByQueryList.add(update);
>                   } else if (oper == UpdateLog.DELETE) {
>                     deleteList.add(new DeleteUpdate(version, (byte[])entry.get(2)));
>                   }
>                   
>                   break;
>                 case UpdateLog.COMMIT:
>                   break;
>                 default:
>                   throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,  "Unknown
Operation! " + oper);
>               }
>             } catch (ClassCastException cl) {
>               log.warn("Unexpected log entry or corrupt log.  Entry=" + o, cl);
>               // would be caused by a corrupt transaction log
>             } catch (Exception ex) {
>               log.warn("Exception reverse reading log", ex);
>               break;
>             }
>           }
>         } catch (IOException e) {
>           // failure to read a log record isn't fatal
>           log.error("Exception reading versions from log",e);
>         } finally {
>           if (reader != null) reader.close();
>         }
>         updateList.add(updatesForLog);
>       }
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message