cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cathy Daw (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-2930) corrupt commitlog
Date Thu, 26 Jan 2012 23:01:42 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194247#comment-13194247
] 

Cathy Daw commented on CASSANDRA-2930:
--------------------------------------

I was able to reproduce this with C* 1.0.7 running the following script.  Running the following
script, I had a timeout error from Pycassa, and on the next server startup, I received the
stack above.

{code}
#!/usr/bin/python

import time
import pycassa
from pycassa import system_manager
from pycassa.system_manager import *
from threading import  Thread                                                            
                          

def test_setup():
    m = pycassa.system_manager.SystemManager('localhost:9160')

    pool = pycassa.pool.ConnectionPool('testks',server_list=['localhost:9160', ], timeout=5,
pool_size=16, max_overflow=0, prefill=False, pool_timeout=30, max_retries=8)

    kspaces = m.list_keyspaces()

    if 'testks' in kspaces:
        m.drop_keyspace('testks')

    m.create_keyspace('testks', SIMPLE_STRATEGY, {'replication_factor': '1'})

    cfs = m.get_keyspace_column_families('testks')

    if 'test_super_cf' not in cfs:
        m.create_column_family('testks', 'test_super_cf', super=True, key_validation_class=system_manager.ASCII_TYPE,
                               comparator_type=system_manager.INT_TYPE, subcomparator_type=system_manager.INT_TYPE,
                               default_validation_class=system_manager.INT_TYPE)

    global db 
    db = pycassa.ColumnFamily(pool, 'test_super_cf',read_consistency_level=pycassa.ConsistencyLevel.QUORUM,
                              write_consistency_level=pycassa.ConsistencyLevel.QUORUM)


def test_mutation():
    global keepGoing
    keepGoing=True
    
    global currentTest
    
# Test #1
    print "Thread 1: Start test1"
    currentTest = 'test1'
    
    for a in range(0, 1000):
        db.insert('testrow', {a : dict([(i, i) for i in range(100000)])})
    
    print "Thread 1: Finish test1"

# Test #2
    print "Thread 1: Start test2"
    currentTest = 'test2'
    db.remove('testrow', super_column=400, columns=dict([(i, i) for i in range(100000)]))
    print "Thread 1: Start test2"

# Test #3
    print "Thread 1: Start test3"
    currentTest = 'test3'

    for a in range(501, 1000):
        db.remove('testrow', super_column=a)

    print "Thread 1: Finish test3"

    time.sleep(3)
    keepGoing=False

def check_rowcount():

    while keepGoing:
        test = currentTest
        currentSuperColCount = db.get_count('testrow')
        currentSub1ColCount = db.get_count('testrow', super_column=1)
        currentSub400ColCount = db.get_count('testrow', super_column=400)

        if currentTest != 'test1' and currentSuperColCount not in [0, 500, 1000]:
            print "--- Thread 2: Mismatch Super Column Count.  # Super Columns: " + str(currentSuperColCount)
+ ".  Current Test: " + test

        if currentSub1ColCount not in [0, 100000]:
            print "--- Thread 2: Mismatch Sub Column Count.  # for column 1: " + str(currentSub1ColCount)
+ ".  Current Test: " + test

                
        if currentSub400ColCount not in [0, 100000]:
            print "--- Thread 2: Mismatch Sub Column Count.  # for column 400: " + str(currentSub400ColCount)
+ ".  Current Test: " + test


if __name__ == '__main__':
    test_setup()

    t1 = Thread(target = test_mutation)                                                  
                                                        
    t1.start()

    time.sleep(1)
    
    t2 = Thread(target = check_rowcount)                                                 
                                                         
    t2.start()
{code}
                
> corrupt commitlog
> -----------------
>
>                 Key: CASSANDRA-2930
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2930
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.1
>         Environment: Linux, amd64.
> Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
>            Reporter: ivan
>            Assignee: Rick Branson
>         Attachments: CommitLog-1310637513214.log
>
>
> We get "Exception encountered during startup" error while Cassandra starts.
> Error messages:
>  INFO 13:56:28,736 Finished reading /var/lib/cassandra/commitlog/CommitLog-1310637513214.log
> ERROR 13:56:28,736 Exception encountered during startup.
> java.io.IOError: java.io.EOFException
>         at org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:265)
>         at org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:281)
>         at org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:236)
>         at java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentSkipListMap.java:1493)
>         at java.util.concurrent.ConcurrentSkipListMap.<init>(ConcurrentSkipListMap.java:1443)
>         at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:419)
>         at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:139)
>         at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:127)
>         at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:382)
>         at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:278)
>         at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:158)
>         at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:175)
>         at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:368)
>         at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:80)
> Caused by: java.io.EOFException
>         at java.io.DataInputStream.readFully(DataInputStream.java:180)
>         at java.io.DataInputStream.readFully(DataInputStream.java:152)
>         at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:394)
>         at org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:368)
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:87)
>         at org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:261)
>         ... 13 more
> Exception encountered during startup.
> java.io.IOError: java.io.EOFException
>         at org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:265)
>         at org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:281)
>         at org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:236)
>         at java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentSkipListMap.java:1493)
>         at java.util.concurrent.ConcurrentSkipListMap.<init>(ConcurrentSkipListMap.java:1443)
>         at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:419)
>         at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:139)
>         at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:127)
>         at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:382)
>         at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:278)
>         at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:158)
>         at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:175)
>         at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:368)
>         at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:80)
> Caused by: java.io.EOFException
>         at java.io.DataInputStream.readFully(DataInputStream.java:180)
>         at java.io.DataInputStream.readFully(DataInputStream.java:152)
>         at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:394)
>         at org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:368)
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:87)
>         at org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:261)
>         ... 13 more
> After some debugging I found that in some serialized supercolumns column counter is less
than the number of serialized columns. Difference was always 1 in corrupt commitlogs. This
error always appears with supercolumns with more than one column, but there are properly serialized
supercolumns also in commitlog.
> I have no clue yet why this error happens. I suspect it maybe a race condition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message