cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pavel Yaskevich (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-4305) CF serialization failure when working with custom secondary indices.
Date Fri, 08 Jun 2012 11:28:23 GMT


Pavel Yaskevich commented on CASSANDRA-4305:

Ok, it is kind of pointless to argue about what can happen in the future but even from your
examples it makes a lot of sense to guarantee RM integrity if we are to send it to yet another
thread or require in CL, otherwise you very much risk persisting the corrupted data at some
point (we don't have mechanism to reject modifications), because as the amount of processing
in Table.apply grows it does so coherent with probability of unnoticed corruption e.g. when
secondary index code would modify cf or columns by mistake racy with triggers/CL for example,
which would lead to a very bad situation. Even if we are to somehow "optimize so that serialize
the RM directly to the file (to avoid a copy)" we still need to convert it into writable form
don't we? And thats were we would have to make hundred and five assertions just to notice
that the calculated size matches the actual data size (like we do in FBUtilities.serialize())
because we would race with other components using the same mutation, e.g. we don't have a
full control over indexing code anymore and even the corruption is not our mistake per se,
we share a good part of guilt just because we let that happen due to the design decisions
which in it's turn would make a negative impression overall.

bq. Furthermore, I have doubt that cloning the CF you're reusing before passing them to RM
in your 2ndary index code will have a measurable impact on performance (though if you have
numbers to show that it does make a noticeable difference, then it's a different discussion).

This is double standards, why do we try so hard not to make a one copy for serialization but
instead require from secondary index to do a clone, of possibly, each CF and do that at the
same stage of write path? I'm talking about cfs.indexManager.applyIndexUpdates() in Table.apply
for example.
> CF serialization failure when working with custom secondary indices.
> --------------------------------------------------------------------
>                 Key: CASSANDRA-4305
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.10
>            Reporter: Pavel Yaskevich
>              Labels: datastax_qa
>         Attachments: CASSANDRA-4305.patch
> Assertion (below) was triggered when client was adding new rows to Solr-backed secondary
indices (1000-row batch without any timeout).
> {noformat}
> ERROR [COMMIT-LOG-WRITER] 2012-05-30 16:39:02,896 (line
139) Fatal exception in thread Thread[COMMIT-LOG-WRITER,5,main]
> java.lang.AssertionError: Final buffer length 176 to accomodate data size of 123 (predicted
87) for RowMutation(keyspace='solrTest1338395932411', key='6b6579383039', modifications=[ColumnFamily(cf1
>         at org.apache.cassandra.utils.FBUtilities.serialize(
>         at org.apache.cassandra.db.RowMutation.getSerializedBuffer(
>         at org.apache.cassandra.db.commitlog.CommitLogSegment.write(
>         at org.apache.cassandra.db.commitlog.CommitLog$
>         at org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(
>         at
>         at
> {noformat}
> After investigation it was clear that it was happening because we were holding instances
of RowMutation queued to the addition to CommitLog to the actual "write" moment which is redundant.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message