cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ching-Cheng Chen <cc...@evidentsoftware.com>
Subject Re: UnserializableColumnFamilyException: Couldn't find cfId
Date Fri, 21 Jan 2011 16:44:51 GMT
We have similar exception before, and the root cause was like Aaron
mentioned.

You will encounter this exception If you have code create CF on the fly and
data was insert into the node which hasn't got schema synced yet.

You will have to call describe_schema_version() to ensure all nodes has same
schema before you start insert data into newly create CF.

Regards,

Chen

On Thu, Jan 20, 2011 at 5:34 PM, Aaron Morton <aaron@thelastpickle.com>wrote:

> Sounds like there are multiple versions of your schema around the cluster.
> What client API are you using? Does it support
> the describe_schema_versions() function? This will tell you how many
> versions there are.
>
> The easy solutions here is scrub the data and start a new 0.7 cluster using
> the release version.If possible you should not use data created in the non
> release versions once you get to production.
>
> Hope that helps.
> Aaron
>
>
> On 21 Jan, 2011,at 09:15 AM, Oleg Proudnikov <olegp@cloudorange.com>
> wrote:
>
> Hi All,
>
> Could you please help me understand the impact on my data?
>
> I am running a 6 node 0.7-rc4 Cassandra cluster with RF=2. Schema was
> defined
> when the cluster was created and did not change. I am doing batch load with
> CL=ONE. The cluster is under some stress in memory and I/O. Each node has
> 1G
> heap. CPU is around 10% but the latency is high.
>
> I saw this exception on 2 out of 6 nodes in a relatively short window of
> time.
> Hector clients received no exception and the nodes continued running. The
> exception has not happened since even though the load is continuing.
> I do get an occasional OOM and I am adjusting thresholds and other
> settings as I go. I also doubled RAM to 2G since the exception.
>
> Here is the exception - the same stack trace in all cases.
> org.apache.cassandra.db.UnserializableColumnFamilyException: C
> ouldn't find cfId=1004
> at org.apache.cassandra.dbColumnFamilySerializer.deserialize
>
> (ColumnFamilySerializer.java:117)
> at org.apache.cassandra.db.RowMutationSerializer.defreezeTheMaps
> (RowMutation.java:385)
> at org.apache.cassandra.db.RowMutationSerializer.deserialize
> (RowMutation.java:395)
> at org.apache.cassandra.db.RowMutationSerializer.deserialize
> (RowMutation.java:353)
> at org.apache.cassandra.db.RowMutationVerbHandler.doVerb
> (RowMutationVerbHandler.java:52)
> at org.apache.cassandra.net.MessageDeliveryTask.run
> (MessageDeliveryTask.java:63)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
>
>
> It refers to two cfIds - cfId=1004 and cfId=1013. Mutation stages are
> always
> different even for the exceptions appearing within the same millisecond.
> As you can see below cfId=004 appears on both nodes several times but at
> different times while cfId=0013 appears only once on one node.
>
> It happened as a group within one second on one node and in 5 groups spread
> across 45 minutes on another node. I left the first log entry of each
> group.
>
> xxx.xxx.xxx.140 grep -i cfid -B 1 log/cassandra.log
> xxx.xxx.xxx.141 grep -i cfid -B 1 log/cassandra.log
> xxx.xxx.xxx.142 grep -i cfid -B 1 log/cassandra.log
> xxx.xxx.xxx.143 grep -i cfid -B 1 log/cassandra.log
>
>
> xxx.xxx.xxx.144 grep -i cfid -B 1 log/cassandra.log
> ERROR [MutationStage:11] 2011-01-14 15:02:03,911
> RowMutationVerbHandler.java
> (line 83) Error in row mutation
> org.apache.cassandra.db.UnserializableColumnFamilyException:
> Couldn't find cfId=1004
>
>
> xxx.xxx.xxx.145 grep -i cfid -B 1 log/cassandra.log
> ERROR [MutationStage:1] 2011-01-14 15:02:34,460 RowMutationVerbHandler.java
> (line 83) Error in row mutation
> org.apache.cassandra.db.UnserializableColumnFamilyException:
> Couldn't find cfId=1004
> --
> ERROR [MutationStage:13] 2011-01-14 15:03:28,637
> RowMutationVerbHandler.java
> (line 83) Error in row mutation
> org.apache.cassandra.db.UnserializableColumnFamilyException:
> Couldn't find cfId=1004
> --
> ERROR [MutationStage:27] 2011-01-14 15:05:02,513
> RowMutationVerbHandler.java
> (line 83) Error in row mutation
> org.apache.cassandra.db.UnserializableColumnFamilyException:
> Couldn't find cfId=1004
> --
> ERROR [MutationStage:4] 2011-01-14 15:12:30,731 RowMutationVerbHandler.java
> (line 83) Error in row mutation
> org.apache.cassandra.db.UnserializableColumnFamilyException:
> Couldn't find cfId=1004
> --
> ERROR [MutationStage:23] 2011-01-14 15:47:03,416
> RowMutationVerbHandler.java
> (line 83) Error in row mutation
> org.apache.cassandra.db.UnserializableColumnFamilyException:
> Couldn't find cfId=1013
>
>
>
> Q. What does this mean for the consistency? Am I still within my guarantee
> of
> CL=ONE?
>
>
>
> NOTE: I experienced similar exceptions in 0.7-rc2 but at that time cfIds
> looked
> corrupted. They were random/negative and these exceptions
> were followed by an OOM with an attempt to allocate a huge HeapByteBuffer.
>
> Thank you very much,
> Oleg
>
>
>
>

Mime
View raw message