incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Dusbabek <gdusba...@gmail.com>
Subject Re: Having trouble getting cassandra to stay up
Date Mon, 27 Dec 2010 14:18:58 GMT
You might want to try starting over.  Configure your initial keyspaces
in conf/cassandra.yaml and load them into your cluster with
bin/schematool.

That nasty stack trace indicates the server is getting data that is
not formatted the way it expects.  Please verify that your cassandra
servers are both running the same version.

Your earlier error when adding a keyspace through pycassa was
confusing.  You stated that you tried to create a keyspace, but the
error you pasted appeared to error in a drop_keyspace call.  Something
doesn't add up.

Gary.


On Fri, Dec 24, 2010 at 11:48, Alex Quan <alex.quan@tinkur.com> wrote:
> Sorry but I am not sure how to answer all the question that you have posed
> since a lot of the stuff I am working with is quite new to me and I haven't
> use many of the tools that are talked about but I will try my best to answer
> the question to the best of my knowledge. I am trying to get the cassandra
> to run between 2 nodes that are both Amazon's ec2 micro instances, I believe
> they are using a 64 bit linux ubuntu 10.01 using java version 1.6.0_23. When
> I said killed it was what was outputted into the console when the process
> died so I am not sure what that exactly means. Here is some of the info
> before cassandra went down:
>
> ring:
>
> Address         Status State   Load            Owns
> Token
>
> 111232248257764777335763873822010980488
> 10.127.155.205  Up     Normal  85.17 KB        59.06%
> 41570168072350555868554892080805525145
> 10.122.123.210  Up     Normal  91.1 KB         40.94%
> 111232248257764777335763873822010980488
>
> vmstat before cassandra is up:
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in  
cs us sy id
> wa
>  0  0      0 328196    632  13936    0    0    12     4   13   
1  0  0 99
> 0
>
> vmstat after cassandra is up:
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in  
cs us sy id
> wa
>  0  2      0   5660    116  10312    0    0    12     4  
13    1  0  0 99
> 0
>
> Then after I run a line like sys.create_keyspace('testing', 1) in pycassa
> with the connections setup to point to my machine I get the following error:
>
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File
> "/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/system_manager.py",
> line 365, in drop_keyspace
>     schema_version = self._conn.system_drop_keyspace(keyspace)
>   File
> "/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/cassandra/Cassandra.py",
> line 1255, in system_drop_keyspace
>     return self.recv_system_drop_keyspace()
>   File
> "/usr/local/lib/python2.6/dist-packages/pycassa-1.0.2-py2.6.egg/pycassa/cassandra/Cassandra.py",
> line 1266, in recv_system_drop_keyspace
>     (fname, mtype, rseqid) = self._iprot.readMessageBegin()
>   File
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/protocol/TBinaryProtocol.py",
> line 126, in readMessageBegin
>     sz = self.readI32()
>   File
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/protocol/TBinaryProtocol.py",
> line 203, in readI32
>     buff = self.trans.readAll(4)
>   File
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
> line 58, in readAll
>     chunk = self.read(sz-have)
>   File
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
> line 272, in read
>     self.readFrame()
>   File
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
> line 276, in readFrame
>     buff = self.__trans.readAll(4)
>   File
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TTransport.py",
> line 58, in readAll
>     chunk = self.read(sz-have)
>   File
> "/usr/local/lib/python2.6/dist-packages/thrift05-0.5.0-py2.6-linux-x86_64.egg/thrift/transport/TSocket.py",
> line 108, in read
>     raise TTransportException(type=TTransportException.END_OF_FILE,
> message='TSocket read 0 bytes')
> thrift.transport.TTransport.TTransportException: TSocket read 0 bytes
>
> and then cassandra on the machine dies, here is the log some of the log of
> the machine that died:
>
>  INFO [FlushWriter:1] 2010-12-24 03:24:01,999 Memtable.java (line 162)
> Completed flushing /var/lib/cassandra/data/system/LocationInfo-e-24-Data.db
> (301 bytes)
>  INFO [main] 2010-12-24 03:24:02,003 Mx4jTool.java (line 73) Will not load
> MX4J, mx4j-tools.jar is not in the classpath
>  INFO [main] 2010-12-24 03:24:02,048 CassandraDaemon.java (line 77) Binding
> thrift service to /0.0.0.0:9160
>  INFO [main] 2010-12-24 03:24:02,050 CassandraDaemon.java (line 91) Using
> TFramedTransport with a max frame size of 15728640 bytes.
>  INFO [main] 2010-12-24 03:24:02,053 CassandraDaemon.java (line 119)
> Listening for thrift clients...
>  INFO [MigrationStage:1] 2010-12-24 03:26:42,226 ColumnFamilyStore.java
> (line 639) switching in a fresh Memtable for Migrations at
> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293161040907.log',
> position=10873)
>  INFO [MigrationStage:1] 2010-12-24 03:26:42,226 ColumnFamilyStore.java
> (line 943) Enqueuing flush of Memtable-Migrations@948345082(5902 bytes, 1
> operations)
>  INFO [FlushWriter:1] 2010-12-24 03:26:42,226 Memtable.java (line 155)
> Writing Memtable-Migrations@948345082(5902 bytes, 1 operations)
>  INFO [MigrationStage:1] 2010-12-24 03:26:42,238 ColumnFamilyStore.java
> (line 639) switching in a fresh Memtable for Schema at
> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293161040907.log',
> position=10873)
>  INFO [MigrationStage:1] 2010-12-24 03:26:42,238 ColumnFamilyStore.java
> (line 943) Enqueuing flush of Memtable-Schema@212165140(2194 bytes, 3
> operations)
>  INFO [FlushWriter:1] 2010-12-24 03:26:45,351 Memtable.java (line 162)
> Completed flushing /var/lib/cassandra/data/system/Migrations-e-11-Data.db
> (6035 bytes)
>  INFO [FlushWriter:1] 2010-12-24 03:26:45,531 Memtable.java (line 155)
> Writing Memtable-Schema@212165140(2194 bytes, 3 operations)
>
> and the log on the machine that stays up:
>
> ERROR [ReadStage:4] 2010-12-24 03:24:01,979 AbstractCassandraDaemon.java
> (line 90) Fatal exception in thread Thread[ReadStage:4,5,main]
> org.apache.avro.AvroTypeException: Found
> {"type":"record","name":"CfDef","namespace":"org.apache.cassandra.avro","fields":[{"name":"keyspace","type":"string"},{"name":"name","type":"string"},{"name":"column_type","type":["string","null"]},{"name":"comparator_type","type":["string","null"]},{"name":"subcomparator_type","type":["string","null"]},{"name":"comment","type":["string","null"]},{"name":"row_cache_size","type":["double","null"]},{"name":"key_cache_size","type":["double","null"]},{"name":"read_repair_chance","type":["double","null"]},{"name":"gc_grace_seconds","type":["int","null"]},{"name":"default_validation_class","type":["null","string"],"default":null},{"name":"min_compaction_threshold","type":["null","int"],"default":null},{"name":"max_compaction_threshold","type":["null","int"],"default":null},{"name":"row_cache_save_period_in_seconds","type":["int","null"],"default":0},{"name":"key_cache_save_period_in_seconds","type":["int","null"],"default":3600},{"name":"memtable_flush_after_mins","type":["int","null"],"default":60},{"name":"memtable_throughput_in_mb","type":["null","int"],"default":null},{"name":"memtable_operations_in_millions","type":["null","double"],"default":null},{"name":"id","type":["int","null"]},{"name":"column_metadata","type":[{"type":"array","items":{"type":"record","name":"ColumnDef","fields":[{"name":"name","type":"bytes"},{"name":"validation_class","type":"string"},{"name":"index_type","type":[{"type":"enum","name":"IndexType","symbols":["KEYS"],"aliases":["org.apache.cassandra.config.avro.IndexType"]},"null"]},{"name":"index_name","type":["string","null"]}]}},"null"]}]},
> expecting
> {"type":"record","name":"CfDef","namespace":"org.apache.cassandra.avro","fields":[{"name":"keyspace","type":"string"},{"name":"name","type":"string"},{"name":"column_type","type":["string","null"]},{"name":"comparator_type","type":["string","null"]},{"name":"subcomparator_type","type":["string","null"]},{"name":"comment","type":["string","null"]},{"name":"row_cache_size","type":["double","null"]},{"name":"key_cache_size","type":["double","null"]},{"name":"read_repair_chance","type":["double","null"]},{"name":"replicate_on_write","type":["boolean","null"]},{"name":"gc_grace_seconds","type":["int","null"]},{"name":"default_validation_class","type":["null","string"],"default":null},{"name":"min_compaction_threshold","type":["null","int"],"default":null},{"name":"max_compaction_threshold","type":["null","int"],"default":null},{"name":"row_cache_save_period_in_seconds","type":["int","null"],"default":0},{"name":"key_cache_save_period_in_seconds","type":["int","null"],"default":3600},{"name":"memtable_flush_after_mins","type":["int","null"],"default":60},{"name":"memtable_throughput_in_mb","type":["null","int"],"default":null},{"name":"memtable_operations_in_millions","type":["null","double"],"default":null},{"name":"id","type":["int","null"]},{"name":"column_metadata","type":[{"type":"array","items":{"type":"record","name":"ColumnDef","fields":[{"name":"name","type":"bytes"},{"name":"validation_class","type":"string"},{"name":"index_type","type":[{"type":"enum","name":"IndexType","symbols":["KEYS"],"aliases":["org.apache.cassandra.config.avro.IndexType"]},"null"]},{"name":"index_name","type":["string","null"]}],"aliases":["org.apache.cassandra.config.avro.ColumnDef"]}},"null"]}],"aliases":["org.apache.cassandra.config.avro.CfDef"]}
>     at
> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:212)
>     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>     at
> org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:121)
>     at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:138)
>     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
>     at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:142)
>     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
>     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:118)
>     at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:142)
>     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
>     at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:105)
>     at
> org.apache.cassandra.io.SerDeUtils.deserializeWithSchema(SerDeUtils.java:98)
>     at
> org.apache.cassandra.db.migration.Migration.deserialize(Migration.java:274)
>     at
> org.apache.cassandra.db.DefinitionsUpdateResponseVerbHandler.doVerb(DefinitionsUpdateResponseVerbHandler.java:56)
>     at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:662)
>  INFO [GossipStage:1] 2010-12-24 03:24:02,151 Gossiper.java (line 583) Node
> /10.127.155.205 has restarted, now UP again
>  INFO [GossipStage:1] 2010-12-24 03:24:02,151 StorageService.java (line 670)
> Node /10.127.155.205 state jump to normal
>  INFO [HintedHandoff:1] 2010-12-24 03:24:02,151 HintedHandOffManager.java
> (line 191) Started hinted handoff for endpoint /10.127.155.205
>  INFO [HintedHandoff:1] 2010-12-24 03:24:02,152 HintedHandOffManager.java
> (line 247) Finished hinted handoff of 0 rows to endpoint /10.127.155.205
>  INFO [WRITE-/10.127.155.205] 2010-12-24 03:26:47,789
> OutboundTcpConnection.java (line 115) error writing to /10.127.155.205
>  INFO [ScheduledTasks:1] 2010-12-24 03:26:58,899 Gossiper.java (line 195)
> InetAddress /10.127.155.205 is now dead.
>
> The ring output on my node that stays up:
>
> Address         Status State   Load            Owns
> Token
>
> 111232248257764777335763873822010980488
> 10.127.155.205  Down   Normal  85.17 KB        59.06%
> 41570168072350555868554892080805525145
> 10.122.123.210  Up     Normal  91.1 KB         40.94%
> 111232248257764777335763873822010980488
>
> I am not sure how to use the jmx tools to connect to these machines so I
> can't really answer that but hopefully this is enough information to
> diagnose my problem, thanks
>
> Alex
>
>
> On Thu, Dec 23, 2010 at 4:35 PM, Dan Hendry <dan.hendry.junk@gmail.com>
> wrote:
>>
>> Your details are rather vague, what do you mean by killed? Is the
>> Cassandra java process still running? Any other warning or error log
>> messages (from either node)? Could you provide the last few Cassandra log
>> lines from each machine? Can you connect to the node via JMX? What is the
>> output of nodetool ring from the second node (which is presumably still
>> alive)? Is there any unusual system activity: high cpu usage, low cpu usage,
>> problems with disk IO (can be checked with vmstat).
>> Can you provide any further system information? Linux/windows, java
>> version, 32/64 bit, amount of ram?
>>
>> On Thu, Dec 23, 2010 at 1:42 PM, Alex Quan <alex.quan@tinkur.com> wrote:
>>>
>>> Hi,
>>>
>>> I am a newbie to cassandra and am using cassandra RC 2. I initially have
>>> cassndra working on one node and was able to create keyspace, column
>>> families and populate the database fine. I tried adding a second node by
>>> changing the seed to point to another node and setting listen_address and
>>> rpc_address to blank. I then started up the second node and it seems to have
>>> connected fine using the node tool but after that I couldn't get it to
>>> accept any commands and whenever I tried to make a new keyspace or column
>>> family it would kill my initial node after a message like this:
>>>
>>>  INFO 18:19:49,335 switching in a fresh Memtable for Schema at
>>> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1293127746481.log',
>>> position=9143)
>>>  INFO 18:19:49,335 Enqueuing flush of Memtable-Schema@1358138608(2410
>>> bytes, 5 operations)
>>> Killed
>>>
>>> and the next few time I start up the server a similar would pop up until
>>> I am guessing all the stuff is flushed out then it would start fine until I
>>> tried to add anything to it. I tried changing back the yaml file back to the
>>> original setup and this still happens. I don't know what to try to get it to
>>> work properly, if you guys can help I would be really grateful
>>>
>>> Alex
>>
>
>

Mime
View raw message