cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kai Wang <dep...@gmail.com>
Subject Re: Corrupt SSTABLE over and over
Date Wed, 17 Aug 2016 09:42:04 GMT
This might not be good news to you. But my experience is that C*
2.X/Windows is not ready for production yet. I've seen various file system
related errors. And in one of the JIRAs I was told major work (or rework)
is done in 3.X to improve C* stability on Windows.

On Tue, Aug 16, 2016 at 3:44 AM, Bryan Cheng <bryan@blockcypher.com> wrote:

> Hi Alaa,
>
> Sounds like you have problems that go beyond Cassandra- likely filesystem
> corruption or bad disks. I don't know enough about Windows to give you any
> specific advice but I'd try a run of chkdsk to start.
>
> --Bryan
>
> On Fri, Aug 12, 2016 at 5:19 PM, Alaa Zubaidi (PDF) <alaa.zubaidi@pdf.com>
> wrote:
>
>> Hi Bryan,
>>
>> Changing disk_failure_policy to best_effort, and running nodetool scrub,
>> did not work, it generated another error:
>> java.nio.file.AccessDeniedException
>>
>> Also tried to remove all files (data, commitlog, savedcaches) and restart
>> the node fresh, and still I am getting corruption.
>>
>> and Still nothing that indicate there is a HW issue?
>> All other nodes are fine
>>
>> Regards,
>> Alaa
>>
>>
>> On Fri, Aug 12, 2016 at 12:00 PM, Bryan Cheng <bryan@blockcypher.com>
>> wrote:
>>
>>> Should also add that if the scope of corruption is _very_ large, and you
>>> have a good, aggressive repair policy (read: you are confident in the
>>> consistency of the data elsewhere in the cluster), you may just want to
>>> decommission and rebuild that node.
>>>
>>> On Fri, Aug 12, 2016 at 11:55 AM, Bryan Cheng <bryan@blockcypher.com>
>>> wrote:
>>>
>>>> Looks like you're doing the offline scrub- have you tried online?
>>>>
>>>> Here's my typical process for corrupt SSTables.
>>>>
>>>> With disk_failure_policy set to stop, examine the failing sstables. If
>>>> they are very small (in the range of kbs), it is unlikely that there is any
>>>> salvageable data there. Just delete them, start the machine, and schedule
a
>>>> repair ASAP.
>>>>
>>>> If they are large, then it may be worth salvaging. If the scope of
>>>> corruption is reasonable (limited to a few sstables scattered among
>>>> different keyspaces), set disk_failure_policy to best_effort, start the
>>>> machine up, and run the nodetool scrub. This is online scrub, faster than
>>>> offline scrub (at least of 2.1.12, the last time I had to do this).
>>>>
>>>> Only if all else fails, attempt the very painful offline sstablescrub.
>>>>
>>>> Is the VMWare client Windows? (Trying to make sure its not just the
>>>> host). YMMV but in the past Windows was somewhat of a neglected platform
>>>> wrt Cassandra. I think you'd have a lot easier time getting help if running
>>>> Linux is an option here.
>>>>
>>>>
>>>>
>>>> On Fri, Aug 12, 2016 at 9:16 AM, Alaa Zubaidi (PDF) <
>>>> alaa.zubaidi@pdf.com> wrote:
>>>>
>>>>> Hi Jason,
>>>>>
>>>>> Thanks for your input...
>>>>> Thats what I am afraid of?
>>>>> Did you find any HW error in the VMware and HW logs? any indication
>>>>> that the HW is the reason? I need to make sure that this is the reason
>>>>> before asking the customer to spend more money?
>>>>>
>>>>> Thanks,
>>>>> Alaa
>>>>>
>>>>> On Thu, Aug 11, 2016 at 11:02 PM, Jason Wee <peichieh@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> cassandra run on virtual server (vmware)?
>>>>>>
>>>>>> > I tried sstablescrub but it crashed with hs-err-pid-...
>>>>>> maybe try with larger heap allocated to sstablescrub
>>>>>>
>>>>>> this sstable corrupt i ran into it as well (on cassandra 1.2), first
i
>>>>>> try nodetool scrub, still persist, then offline sstablescrub still
>>>>>> persist, wipe the node and it happen again, then i change the hardware
>>>>>> (disk and mem). things went good.
>>>>>>
>>>>>> hth
>>>>>>
>>>>>> jason
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 12, 2016 at 9:20 AM, Alaa Zubaidi (PDF)
>>>>>> <alaa.zubaidi@pdf.com> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > I have a 16 Node cluster, Cassandra 2.2.1 on Windows, local
>>>>>> installation
>>>>>> > (NOT on the cloud)
>>>>>> >
>>>>>> > and I am getting
>>>>>> > Error [CompactionExecutor:2] 2016-08-12 06:51:52, 983 Cassandra
>>>>>> > Daemon.java:183 - Execption in thread Thread[CompactionExecutor:2,1m
>>>>>> ain]
>>>>>> > org.apache.cassandra.io.FSReaderError:
>>>>>> > org.apache.cassandra.io.sstable.CorruptSSTableExecption:
>>>>>> > org.apache.cassandra.io.compress.CurrptBlockException:
>>>>>> > (E:\........\la-4886-big-Data.db): corruption detected, chunk
at
>>>>>> 4969092 of
>>>>>> > length 10208.
>>>>>> >     at
>>>>>> > org.apache.cassandra.io.util.RandomAccessReader.readBytes(Ra
>>>>>> ndomAccessReader.java:357)
>>>>>> > ~[apache-cassandra-2.2.1.jar:2.2.1]
>>>>>> > ....
>>>>>> > ....
>>>>>> > ERROR [CompactionExecutor:2] ....... FileUtils.java:463 - Existing
>>>>>> > forcefully due to file system exception on startup, disk failure
>>>>>> policy
>>>>>> > "stop"
>>>>>> >
>>>>>> > I tried sstablescrub but it crashed with hs-err-pid-...
>>>>>> > I removed the corrupted file and started the Node again, after
one
>>>>>> day the
>>>>>> > corruption came back again, I removed the files, and restarted
>>>>>> Cassandra, it
>>>>>> > worked for few days, then I ran "nodetool repair" after it finished,
>>>>>> > Cassandra failed again but with commitlog corruption, after
>>>>>> removing the
>>>>>> > commitlog files, it failed again with another sstable corruption.
>>>>>> >
>>>>>> > I was also checking the HW, file system, and memory, the VMware
>>>>>> logs showed
>>>>>> > no HW error, also the HW management logs showed NO problems
or
>>>>>> issues.
>>>>>> > Also checked the Windows Logs (Application and System) the only
>>>>>> thing I
>>>>>> > found is on the system logs "Cassandra Service terminated with
>>>>>> > service-specific error Cannot create another system semaphore.
>>>>>> >
>>>>>> > I could not find any thing regarding that error, all comments
point
>>>>>> to
>>>>>> > application log.
>>>>>> >
>>>>>> > Any help is appreciated..
>>>>>> >
>>>>>> > --
>>>>>> >
>>>>>> > Alaa Zubaidi
>>>>>> >
>>>>>> >
>>>>>> > This message may contain confidential and privileged information.
>>>>>> If it has
>>>>>> > been sent to you in error, please reply to advise the sender
of the
>>>>>> error
>>>>>> > and then immediately permanently delete it and all attachments
to
>>>>>> it from
>>>>>> > your systems. If you are not the intended recipient, do not
read,
>>>>>> copy,
>>>>>> > disclose or otherwise use this message or any attachments to
it.
>>>>>> The sender
>>>>>> > disclaims any liability for such unauthorized use. PLEASE NOTE
that
>>>>>> all
>>>>>> > incoming e-mails sent to PDF e-mail accounts will be archived
and
>>>>>> may be
>>>>>> > scanned by us and/or by external service providers to detect
and
>>>>>> prevent
>>>>>> > threats to our systems, investigate illegal or inappropriate
>>>>>> behavior,
>>>>>> > and/or eliminate unsolicited promotional e-mails (“spam”).
If you
>>>>>> have any
>>>>>> > concerns about this process, please contact us at
>>>>>> legal.department@pdf.com.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Alaa Zubaidi
>>>>> PDF Solutions, Inc.
>>>>> 333 West San Carlos Street, Suite 1000
>>>>> San Jose, CA 95110  USA
>>>>> Tel: 408-283-5639
>>>>> fax: 408-938-6479
>>>>> email: alaa.zubaidi@pdf.com
>>>>>
>>>>>
>>>>> *This message may contain confidential and privileged information. If
>>>>> it has been sent to you in error, please reply to advise the sender of
the
>>>>> error and then immediately permanently delete it and all attachments
to it
>>>>> from your systems. If you are not the intended recipient, do not read,
>>>>> copy, disclose or otherwise use this message or any attachments to it.
The
>>>>> sender disclaims any liability for such unauthorized use. PLEASE NOTE
that
>>>>> all incoming e-mails sent to PDF e-mail accounts will be archived and
may
>>>>> be scanned by us and/or by external service providers to detect and prevent
>>>>> threats to our systems, investigate illegal or inappropriate behavior,
>>>>> and/or eliminate unsolicited promotional e-mails (“spam”). If you
have any
>>>>> concerns about this process, please contact us at *
>>>>> *legal.department@pdf.com* <legal.department@pdf.com>*.*
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>>
>> Alaa Zubaidi
>> PDF Solutions, Inc.
>> 333 West San Carlos Street, Suite 1000
>> San Jose, CA 95110  USA
>> Tel: 408-283-5639
>> fax: 408-938-6479
>> email: alaa.zubaidi@pdf.com
>>
>>
>> *This message may contain confidential and privileged information. If it
>> has been sent to you in error, please reply to advise the sender of the
>> error and then immediately permanently delete it and all attachments to it
>> from your systems. If you are not the intended recipient, do not read,
>> copy, disclose or otherwise use this message or any attachments to it. The
>> sender disclaims any liability for such unauthorized use. PLEASE NOTE that
>> all incoming e-mails sent to PDF e-mail accounts will be archived and may
>> be scanned by us and/or by external service providers to detect and prevent
>> threats to our systems, investigate illegal or inappropriate behavior,
>> and/or eliminate unsolicited promotional e-mails (“spam”). If you have any
>> concerns about this process, please contact us at *
>> *legal.department@pdf.com* <legal.department@pdf.com>*.*
>>
>
>

Mime
View raw message