cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Cheng <br...@blockcypher.com>
Subject Re: Corrupt SSTABLE over and over
Date Fri, 12 Aug 2016 19:00:01 GMT
Should also add that if the scope of corruption is _very_ large, and you
have a good, aggressive repair policy (read: you are confident in the
consistency of the data elsewhere in the cluster), you may just want to
decommission and rebuild that node.

On Fri, Aug 12, 2016 at 11:55 AM, Bryan Cheng <bryan@blockcypher.com> wrote:

> Looks like you're doing the offline scrub- have you tried online?
>
> Here's my typical process for corrupt SSTables.
>
> With disk_failure_policy set to stop, examine the failing sstables. If
> they are very small (in the range of kbs), it is unlikely that there is any
> salvageable data there. Just delete them, start the machine, and schedule a
> repair ASAP.
>
> If they are large, then it may be worth salvaging. If the scope of
> corruption is reasonable (limited to a few sstables scattered among
> different keyspaces), set disk_failure_policy to best_effort, start the
> machine up, and run the nodetool scrub. This is online scrub, faster than
> offline scrub (at least of 2.1.12, the last time I had to do this).
>
> Only if all else fails, attempt the very painful offline sstablescrub.
>
> Is the VMWare client Windows? (Trying to make sure its not just the host).
> YMMV but in the past Windows was somewhat of a neglected platform wrt
> Cassandra. I think you'd have a lot easier time getting help if running
> Linux is an option here.
>
>
>
> On Fri, Aug 12, 2016 at 9:16 AM, Alaa Zubaidi (PDF) <alaa.zubaidi@pdf.com>
> wrote:
>
>> Hi Jason,
>>
>> Thanks for your input...
>> Thats what I am afraid of?
>> Did you find any HW error in the VMware and HW logs? any indication that
>> the HW is the reason? I need to make sure that this is the reason before
>> asking the customer to spend more money?
>>
>> Thanks,
>> Alaa
>>
>> On Thu, Aug 11, 2016 at 11:02 PM, Jason Wee <peichieh@gmail.com> wrote:
>>
>>> cassandra run on virtual server (vmware)?
>>>
>>> > I tried sstablescrub but it crashed with hs-err-pid-...
>>> maybe try with larger heap allocated to sstablescrub
>>>
>>> this sstable corrupt i ran into it as well (on cassandra 1.2), first i
>>> try nodetool scrub, still persist, then offline sstablescrub still
>>> persist, wipe the node and it happen again, then i change the hardware
>>> (disk and mem). things went good.
>>>
>>> hth
>>>
>>> jason
>>>
>>>
>>> On Fri, Aug 12, 2016 at 9:20 AM, Alaa Zubaidi (PDF)
>>> <alaa.zubaidi@pdf.com> wrote:
>>> > Hi,
>>> >
>>> > I have a 16 Node cluster, Cassandra 2.2.1 on Windows, local
>>> installation
>>> > (NOT on the cloud)
>>> >
>>> > and I am getting
>>> > Error [CompactionExecutor:2] 2016-08-12 06:51:52, 983 Cassandra
>>> > Daemon.java:183 - Execption in thread Thread[CompactionExecutor:2,1m
>>> ain]
>>> > org.apache.cassandra.io.FSReaderError:
>>> > org.apache.cassandra.io.sstable.CorruptSSTableExecption:
>>> > org.apache.cassandra.io.compress.CurrptBlockException:
>>> > (E:\........\la-4886-big-Data.db): corruption detected, chunk at
>>> 4969092 of
>>> > length 10208.
>>> >     at
>>> > org.apache.cassandra.io.util.RandomAccessReader.readBytes(Ra
>>> ndomAccessReader.java:357)
>>> > ~[apache-cassandra-2.2.1.jar:2.2.1]
>>> > ....
>>> > ....
>>> > ERROR [CompactionExecutor:2] ....... FileUtils.java:463 - Existing
>>> > forcefully due to file system exception on startup, disk failure policy
>>> > "stop"
>>> >
>>> > I tried sstablescrub but it crashed with hs-err-pid-...
>>> > I removed the corrupted file and started the Node again, after one day
>>> the
>>> > corruption came back again, I removed the files, and restarted
>>> Cassandra, it
>>> > worked for few days, then I ran "nodetool repair" after it finished,
>>> > Cassandra failed again but with commitlog corruption, after removing
>>> the
>>> > commitlog files, it failed again with another sstable corruption.
>>> >
>>> > I was also checking the HW, file system, and memory, the VMware logs
>>> showed
>>> > no HW error, also the HW management logs showed NO problems or issues.
>>> > Also checked the Windows Logs (Application and System) the only thing I
>>> > found is on the system logs "Cassandra Service terminated with
>>> > service-specific error Cannot create another system semaphore.
>>> >
>>> > I could not find any thing regarding that error, all comments point to
>>> > application log.
>>> >
>>> > Any help is appreciated..
>>> >
>>> > --
>>> >
>>> > Alaa Zubaidi
>>> >
>>> >
>>> > This message may contain confidential and privileged information. If
>>> it has
>>> > been sent to you in error, please reply to advise the sender of the
>>> error
>>> > and then immediately permanently delete it and all attachments to it
>>> from
>>> > your systems. If you are not the intended recipient, do not read, copy,
>>> > disclose or otherwise use this message or any attachments to it. The
>>> sender
>>> > disclaims any liability for such unauthorized use. PLEASE NOTE that all
>>> > incoming e-mails sent to PDF e-mail accounts will be archived and may
>>> be
>>> > scanned by us and/or by external service providers to detect and
>>> prevent
>>> > threats to our systems, investigate illegal or inappropriate behavior,
>>> > and/or eliminate unsolicited promotional e-mails (“spam”). If you have
>>> any
>>> > concerns about this process, please contact us at
>>> legal.department@pdf.com.
>>>
>>
>>
>>
>> --
>>
>> Alaa Zubaidi
>> PDF Solutions, Inc.
>> 333 West San Carlos Street, Suite 1000
>> San Jose, CA 95110  USA
>> Tel: 408-283-5639
>> fax: 408-938-6479
>> email: alaa.zubaidi@pdf.com
>>
>>
>> *This message may contain confidential and privileged information. If it
>> has been sent to you in error, please reply to advise the sender of the
>> error and then immediately permanently delete it and all attachments to it
>> from your systems. If you are not the intended recipient, do not read,
>> copy, disclose or otherwise use this message or any attachments to it. The
>> sender disclaims any liability for such unauthorized use. PLEASE NOTE that
>> all incoming e-mails sent to PDF e-mail accounts will be archived and may
>> be scanned by us and/or by external service providers to detect and prevent
>> threats to our systems, investigate illegal or inappropriate behavior,
>> and/or eliminate unsolicited promotional e-mails (“spam”). If you have any
>> concerns about this process, please contact us at *
>> *legal.department@pdf.com* <legal.department@pdf.com>*.*
>>
>
>

Mime
View raw message