cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Neophytos Demetriou (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-7) Cassandra silently loses data when a single row gets large
Date Mon, 16 Mar 2009 21:59:50 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682482#action_12682482
] 

Neophytos Demetriou commented on CASSANDRA-7:
---------------------------------------------

(a) It happens when you insert a large number of columns in a single row
(b) Cassandra silently loses some of these inserts (batch inserts are also inserts). 
(c) This DOES happen when the threshold is violated (the cumulative size is only one of the
reasons for the threshold to be violated)
(d) It is also while flushing the memtable to disk.

Yes, I can open a new ticket but it seemed relevant to this issue.

> Cassandra silently loses data when a single row gets large
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-7
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: code in trunk, Red Hat 4.1.2-33,  Linux version 2.6.23.1-42.fc8,
java version "1.7.0-nio2"
>            Reporter: Sandeep Tata
>            Priority: Critical
>         Attachments: BigReadWriteTest.java, dirty_bit_patch.txt, dirty_bit_patch_v2.txt
>
>
> When you insert a large number of columns in a single row, Cassandra silently loses some
of these inserts.
> This does not happen until the cumulative size of the columns in a single row exceeds
several megabytes.
> Say each value is 1MB large, 
> insert("row", "col0", value, timestamp)
> insert("row", "col1", value, timestamp)
> insert("row", "col2", value, timestamp)
> ...
> ...
> insert("row", "col100", value, timestamp)
> Running: 
> get_column("row", "col0")
> get_column("row", "col1")
> ...
> ..
> get_column("row", "col100")
> The sequence of get_columns will fail at some point before 100. This was a problem with
the old code in code.google also.
> I will attach a small program that will help you reproduce this. 
> 1. This only happens when the cumulative size of the row exceeds several megabytes. 
> 2. In fact, the single row should be large enough to trigger an SSTable flush to trigger
this error.
> 3. No OutOfMemory errors are thrown, there is nothing relevant in the logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message