cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8399) Reference Counter exception when dropping user type
Date Mon, 29 Dec 2014 18:30:13 GMT


Benedict commented on CASSANDRA-8399:

bq.  having two methods to accomplish the same thing is error prone, considerably more difficult
to maintain and reason about

Given how awfully error prone the ref counting has proven itself to be (see the number of
linked issues in CASSANDRA-7705), and the age-old maxim that the simplest explanation is the
best, it is pretty reasonable to think that ref counting is simply hard to get right by itself.
There isn't much evidence that the OpOrder makes this appreciably harder. Personally, I think
the more places we can use OpOrder the better since it is easier to reason about, having a
simple open/close for the entire operation. But this is fairly off topic.

bq. I agree. At the time I first changed this code, I wasn't aware of the OpOrder -> SSTableScanner
protection relationship so the design was more appropriate.

I was not attempting to criticise your work, but the current (and historic) state of this
class in abstraction. The presence of reference counting here seems testament to the problematic
relationship we have with reference counting as a whole, since there seems to be no good reason
for it - that is unless we expect to construct a scanner, finish the protected operation that
opened it and then pass that scanner to another unprotected operation. But this would seem
to me to be an avoidable and dangerous design that should itself by avoided rather than introducing
unconditional (and hence dangerous) reference counting, or making assumptions about correct
behaviour if the count is unobtainable.

bq. he change in the CompactionTask with CASSANDRA-7932 uncovered the fact that we were relying
on an undocumented ordering relationship between SSTR closing and SSTableScanner closing for
the operation to work on Windows... relationship needs to be formalized 

I'm not familiar with the ordering relationship you're alluding to, but formalisation is always
a good thing.

> Reference Counter exception when dropping user type
> ---------------------------------------------------
>                 Key: CASSANDRA-8399
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Philip Thompson
>            Assignee: Joshua McKenzie
>             Fix For: 2.1.3
>         Attachments: 8399_fix_empty_results.txt, 8399_v2.txt, node2.log, ubuntu-8399.log
> When running the dtest {{}}
with the current 2.1-HEAD code, very frequently, but not always, when dropping a type, the
following exception is seen:{code}
> ERROR [MigrationStage:1] 2014-12-01 13:54:54,824 - Exception
in thread Thread[MigrationStage:1,5,main]
> java.lang.AssertionError: Reference counter -1 for /var/folders/v3/z4wf_34n1q506_xjdy49gb780000gn/T/dtest-eW2RXj/test/node2/data/system/schema_keyspaces-b0f2235744583cdb9631c43e59ce3676/system-sche
> ma_keyspaces-ka-14-Data.db
>         at
>         at
>         at org.apache.cassandra.utils.MergeIterator.close( ~[main/:na]
>         at org.apache.cassandra.db.ColumnFamilyStore$8.close(
>         at org.apache.cassandra.db.ColumnFamilyStore.filter(
>         at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(
>         at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(
>         at org.apache.cassandra.db.SystemKeyspace.serializedSchema(
>         at org.apache.cassandra.db.SystemKeyspace.serializedSchema(
>         at org.apache.cassandra.config.Schema.updateVersion( ~[main/:na]
>         at org.apache.cassandra.config.Schema.updateVersionAndAnnounce(
>         at org.apache.cassandra.db.DefsTables.mergeSchema( ~[main/:na]
>         at org.apache.cassandra.db.DefinitionsUpdateVerbHandler$1.runMayThrow(
>         at ~[main/:na]
>         at java.util.concurrent.Executors$ ~[na:1.7.0_67]
>         at ~[na:1.7.0_67]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
>         at java.util.concurrent.ThreadPoolExecutor$
>         at [na:1.7.0_67]{code}
> Log of the node with the error is attached.

This message was sent by Atlassian JIRA

View raw message