lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From simon <mtnes...@gmail.com>
Subject Re: checksum failed (hardware problem?)
Date Wed, 26 Sep 2018 13:44:43 GMT
I saw something like this a year ago which i reported as a possible bug  (
https://issues.apache.org/jira/browse/SOLR-10840, which has  a full
description and stack traces)

This occurred very randomly on an AWS instance; moving the index directory
to a different file system did not fix the problem Eventually I cloned our
environment to a new AWS instance, which proved to be the solution. Why, I
have no idea...

-Simon

On Mon, Sep 24, 2018 at 1:13 PM, Susheel Kumar <susheel2777@gmail.com>
wrote:

> Got it. I'll have first hardware folks check and if they don't see/find
> anything suspicious then i'll return here.
>
> Wondering if any body has seen similar error and if they were able to
> confirm if it was hardware fault or so.
>
> Thnx
>
> On Mon, Sep 24, 2018 at 1:01 PM Erick Erickson <erickerickson@gmail.com>
> wrote:
>
> > Mind you it could _still_ be Solr/Lucene, but let's check the hardware
> > first ;)
> > On Mon, Sep 24, 2018 at 9:50 AM Susheel Kumar <susheel2777@gmail.com>
> > wrote:
> > >
> > > Hi Erick,
> > >
> > > Thanks so much for your reply.  I'll now look mostly into any possible
> > > hardware issues than Solr/Lucene.
> > >
> > > Thanks again.
> > >
> > > On Mon, Sep 24, 2018 at 12:43 PM Erick Erickson <
> erickerickson@gmail.com
> > >
> > > wrote:
> > >
> > > > There are several of reasons this would "suddenly" start appearing.
> > > > 1> Your disk went bad and some sector is no longer faithfully
> > > > recording the bits. In this case the checksum will be wrong
> > > > 2> You ran out of disk space sometime and the index was corrupted.
> > > > This isn't really a hardware problem.
> > > > 3> Your disk controller is going wonky and not reading reliably.
> > > >
> > > > The "possible hardware issue" message is to alert you that this is
> > > > highly unusual and you should at leasts consider doing integrity
> > > > checks on your disk before assuming it's a Solr/Lucene problem
> > > >
> > > > Best,
> > > > Erick
> > > > On Mon, Sep 24, 2018 at 9:26 AM Susheel Kumar <susheel2777@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > I am still trying to understand the corrupt index exception we saw
> > in our
> > > > > logs. What does the hardware problem comment indicates here?  Does
> > that
> > > > > mean it caused most likely due to hardware issue?
> > > > >
> > > > > We never had this problem in last couple of months. The Solr is
> > 6.6.2 and
> > > > > ZK: 3.4.10.
> > > > >
> > > > > Please share your thoughts.
> > > > >
> > > > > Thanks,
> > > > > Susheel
> > > > >
> > > > > Caused by: org.apache.lucene.index.CorruptIndexException: checksum
> > > > > failed *(hardware
> > > > > problem?)* : expected=db243d1a actual=7a00d3d2
> > > > >
> > > >
> > (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/
> app/solr/data/COLL_shard1_replica1/data/index/_i27s.cfs")
> > > > > [slice=_i27s_Lucene50_0.tim])
> > > > >
> > > > > It suddenly started in the logs and before which there was no such
> > error.
> > > > > Searches & ingestions all seems to be working prior to that.
> > > > >
> > > > > ----
> > > > >
> > > > > 2018-09-03 17:16:49.056 INFO  (qtp834133664-519872) [c:COLL
> s:shard1
> > > > > r:core_node1 x:COLL_shard1_replica1]
> > > > > o.a.s.u.p.StatelessScriptUpdateProcessorFactory
> > update-script#processAdd:
> > > > >
> > newid=G31MXMRZESC0CYPR!A-G31MXMRZESC0CYPR.2552019802_1-25520
> 08480_1-en_US
> > > > > 2018-09-03 17:16:49.057 ERROR (qtp834133664-519872) [c:COLL
> s:shard1
> > > > > r:core_node1 x:COLL_shard1_replica1] o.a.s.h.RequestHandlerBase
> > > > > org.apache.solr.common.SolrException: Exception writing document
> id
> > > > > G31MXMRZESC0CYPR!A-G31MXMRZESC0CYPR.2552019802_1-2552008480_
> 1-en_US
> > to
> > > > the
> > > > > index; possible analysis error.
> > > > > at
> > > > >
> > > >
> > org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpd
> ateHandler2.java:206)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.update.processor.RunUpdateProcessor.processA
> dd(RunUpdateProcessorFactory.java:67)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.update.processor.UpdateRequestProcessor.proc
> essAdd(UpdateRequestProcessor.java:55)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.update.processor.DistributedUpdateProcessor.
> doLocalAdd(DistributedUpdateProcessor.java:979)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.update.processor.DistributedUpdateProcessor.
> versionAdd(DistributedUpdateProcessor.java:1192)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.update.processor.DistributedUpdateProcessor.
> processAdd(DistributedUpdateProcessor.java:748)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.update.processor.UpdateRequestProcessor.proc
> essAdd(UpdateRequestProcessor.java:55)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.update.processor.StatelessScriptUpdateProcessorFactory$
> ScriptUpdateProcessor.processAdd(StatelessScriptUpdateProces
> sorFactory.java:380)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.handler.loader.JavabinLoader$1.update(Javabi
> nLoader.java:98)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCod
> ec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:180)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCod
> ec$1.readIterator(JavaBinUpdateRequestCodec.java:136)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinC
> odec.java:306)
> > > > > at
> > > > org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCode
> c.java:251)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCod
> ec$1.readNamedList(JavaBinUpdateRequestCodec.java:122)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinC
> odec.java:271)
> > > > > at
> > > > org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCode
> c.java:251)
> > > > > at
> > > >
> > org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCo
> dec.java:173)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.client.solrj.request.JavaBinUpdateRequestCod
> ec.unmarshal(JavaBinUpdateRequestCodec.java:187)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDoc
> s(JavabinLoader.java:108)
> > > > > at
> > > >
> > org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:55)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRe
> questHandler.java:97)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.handler.ContentStreamHandlerBase.handleReque
> stBody(ContentStreamHandlerBase.java:68)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.handler.RequestHandlerBase.handleRequest(Req
> uestHandlerBase.java:173)
> > > > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
> > > > > at
> > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
> > > > > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:
> 529)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
> atchFilter.java:361)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
> atchFilter.java:305)
> > > > > at
> > > > >
> > > >
> > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte
> r(ServletHandler.java:1691)
> > > > > at
> > > > >
> > > >
> > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHan
> dler.java:582)
> > > > > at
> > > > >
> > > >
> > org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
> Handler.java:143)
> > > > > at
> > > > >
> > > >
> > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHa
> ndler.java:548)
> > > > > at
> > > > >
> > > >
> > org.eclipse.jetty.server.session.SessionHandler.doHandle(
> SessionHandler.java:226)
> > > > > at
> > > > >
> > > >
> > org.eclipse.jetty.server.handler.ContextHandler.doHandle(
> ContextHandler.java:1180)
> > > > > at
> > > >
> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHand
> ler.java:512)
> > > > > at
> > > > >
> > > >
> > org.eclipse.jetty.server.session.SessionHandler.doScope(
> SessionHandler.java:185)
> > > > > at
> > > > >
> > > >
> > org.eclipse.jetty.server.handler.ContextHandler.doScope(
> ContextHandler.java:1112)
> > > > > at
> > > > >
> > > >
> > org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
> Handler.java:141)
> > > > > at
> > > > >
> > > >
> > org.eclipse.jetty.server.handler.ContextHandlerCollection.ha
> ndle(ContextHandlerCollection.java:213)
> > > > > at
> > > > >
> > > >
> > org.eclipse.jetty.server.handler.HandlerCollection.handle(
> HandlerCollection.java:119)
> > > > > at
> > > > >
> > > >
> > org.eclipse.jetty.server.handler.HandlerWrapper.handle(Handl
> erWrapper.java:134)
> > > > > at
> > > > >
> > > >
> > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(Rewr
> iteHandler.java:335)
> > > > > at
> > > > >
> > > >
> > org.eclipse.jetty.server.handler.HandlerWrapper.handle(Handl
> erWrapper.java:134)
> > > > > at org.eclipse.jetty.server.Server.handle(Server.java:534)
> > > > > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.
> java:320)
> > > > > at
> > > > >
> > > >
> > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConne
> ction.java:251)
> > > > > at
> > > > > org.eclipse.jetty.io
> > > > .AbstractConnection$ReadCallback.succeeded(AbstractConnectio
> n.java:273)
> > > > > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.
> java:95)
> > > > > at
> > > > > org.eclipse.jetty.io
> > > > .SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
> > > > > at
> > > > >
> > > >
> > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
> .executeProduceConsume(ExecuteProduceConsume.java:303)
> > > > > at
> > > > >
> > > >
> > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
> .produceConsume(ExecuteProduceConsume.java:148)
> > > > > at
> > > > >
> > > >
> > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
> .run(ExecuteProduceConsume.java:136)
> > > > > at
> > > > >
> > > >
> > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(Queued
> ThreadPool.java:671)
> > > > > at
> > > > >
> > > >
> > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedT
> hreadPool.java:589)
> > > > > at java.lang.Thread.run(Thread.java:748)
> > > > > Caused by: org.apache.lucene.store.AlreadyClosedException: this
> > > > IndexWriter
> > > > > is closed
> > > > > at
> > org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:749)
> > > > > at
> > org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:763)
> > > > > at
> > > >
> > org.apache.lucene.index.IndexWriter.updateDocument(IndexWrit
> er.java:1567)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.update.DirectUpdateHandler2.updateDocument(D
> irectUpdateHandler2.java:924)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocVa
> lues(DirectUpdateHandler2.java:913)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(D
> irectUpdateHandler2.java:302)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUp
> dateHandler2.java:239)
> > > > > at
> > > > >
> > > >
> > org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpd
> ateHandler2.java:194)
> > > > > ... 54 more
> > > > > Caused by: org.apache.lucene.index.CorruptIndexException: checksum
> > failed
> > > > > (hardware problem?) : expected=db243d1a actual=7a00d3d2
> > > > >
> > > >
> > (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/
> app/solr/data/COLL_shard1_replica1/data/index/_i27s.cfs")
> > > > > [slice=_i27s_Lucene50_0.tim]))
> > > > > at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.
> java:419)
> > > > > at
> > > >
> > org.apache.lucene.codecs.CodecUtil.checksumEntireFile(CodecU
> til.java:526)
> > > > > at
> > > > >
> > > >
> > org.apache.lucene.codecs.blocktree.BlockTreeTermsReader.chec
> kIntegrity(BlockTreeTermsReader.java:336)
> > > > > at
> > > > >
> > > >
> > org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$Fie
> ldsReader.checkIntegrity(PerFieldPostingsFormat.java:348)
> > > > > at
> > > > >
> > > >
> > org.apache.lucene.codecs.perfield.PerFieldMergeState$FilterF
> ieldsProducer.checkIntegrity(PerFieldMergeState.java:271)
> > > > > at
> > org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:96)
> > > > > at
> > > > >
> > > >
> > org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$Fie
> ldsWriter.merge(PerFieldPostingsFormat.java:164)
> > > > > at
> > > >
> > org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:216)
> > > > > at
> > org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:101)
> > > > > at
> > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4356)
> > > > > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:
> 3931)
> > > > > at
> > org.apache.solr.update.SolrIndexWriter.merge(SolrIndexWriter.java:188)
> > > > > at
> > > > >
> > > >
> > org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(Con
> currentMergeScheduler.java:624)
> > > > > at
> > > > >
> > > >
> > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread
> .run(ConcurrentMergeScheduler.java:661)
> > > > >
> > > > > 2018-09-03 17:16:49.116 INFO  (qtp834133664-519872) [c:COLL
> s:shard1
> > > > > r:core_node1 x:COLL_shard1_replica1] o.a.s.c.S.Request
> > > > > [COLL_shard1_replica1]  webapp=/solr path=/update
> > > > > params={wt=javabin&version=2} status=400 QTime=69
> > > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message