cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Laing, Michael" <michael.la...@nytimes.com>
Subject Re: Cassandra Snapshots giving me corrupted SSTables in the logs
Date Fri, 28 Mar 2014 20:32:44 GMT
+1 for tablesnap


On Fri, Mar 28, 2014 at 4:28 PM, Jonathan Haddad <jon@jonhaddad.com> wrote:

> I will +1 the recommendation on using tablesnap over EBS.  S3 is at least
> predictable.
>
> Additionally, from a practical standpoint, you may want to back up your
> sstables somewhere.  If you use S3, it's easy to pull just the new tables
> out via aws-cli tools (s3 sync), to your remote, non-aws server, and not
> incur the overhead of routinely backing up the entire dataset.  For a non
> trivial database, this matters quite a bit.
>
>
> On Fri, Mar 28, 2014 at 1:21 PM, Laing, Michael <michael.laing@nytimes.com
> > wrote:
>
>> As I tried to say, EBS snapshots require much care or you get corruption
>> such as you have encountered.
>>
>> Does Cassandra quiesce the file system after a snapshot using fsfreeze or
>> xfs_freeze? Somehow I doubt it...
>>
>>
>> On Fri, Mar 28, 2014 at 4:17 PM, Jonathan Haddad <jon@jonhaddad.com>wrote:
>>
>>> I have a nagging memory of reading about issues with virtualization and
>>> not actually having durable versions of your data even after an fsync
>>> (within the VM).  Googling around lead me to this post:
>>> http://petercai.com/virtualization-is-bad-for-database-integrity/
>>>
>>> It's possible you're hitting this issue, with with the virtualization
>>> layer, or with EBS itself.  Just a shot in the dark though, other people
>>> would likely know much more than I.
>>>
>>>
>>>
>>> On Fri, Mar 28, 2014 at 12:50 PM, Russ Lavoie <ussray_00@yahoo.com>wrote:
>>>
>>>> Robert,
>>>>
>>>> That is what I thought as well.  But apparently something is happening.
>>>>  The only way I can get away with doing this is adding a sleep 60 right
>>>> after the nodetool snapshot is executed.  I can reproduce this 100% of the
>>>> time by not issuing a sleep after nodetool snapshot.
>>>>
>>>> This is the error.
>>>>
>>>> ERROR [SSTableBatchOpen:1] 2014-03-28 17:08:14,290 CassandraDaemon.java
>>>> (line 191) Exception in thread Thread[SSTableBatchOpen:1,5,main]
>>>> org.apache.cassandra.io.sstable.CorruptSSTableException:
>>>> java.io.EOFException
>>>> at
>>>> org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:108)
>>>> at
>>>> org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:63)
>>>>  at
>>>> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
>>>> at
>>>> org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:407)
>>>>  at
>>>> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:198)
>>>> at
>>>> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157)
>>>> at
>>>> org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:262)
>>>> at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>  at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:744)
>>>> Caused by: java.io.EOFException
>>>> at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
>>>> at java.io.DataInputStream.readUTF(DataInputStream.java:589)
>>>> at java.io.DataInputStream.readUTF(DataInputStream.java:564)
>>>> at
>>>> org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:83)
>>>>  ... 11 more
>>>>
>>>>
>>>>   On Friday, March 28, 2014 2:38 PM, Robert Coli <rcoli@eventbrite.com>
>>>> wrote:
>>>>  On Fri, Mar 28, 2014 at 12:21 PM, Russ Lavoie <ussray_00@yahoo.com>wrote:
>>>>
>>>> Thank you for your quick response.
>>>>
>>>> Is there a way to tell when a snapshot is completely done?
>>>>
>>>>
>>>> IIRC, the JMX call blocks until the snapshot completes. It should be
>>>> done when nodetool returns.
>>>>
>>>> =Rob
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Jon Haddad
>>> http://www.rustyrazorblade.com
>>> skype: rustyrazorblade
>>>
>>
>>
>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> skype: rustyrazorblade
>

Mime
View raw message