cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sandeep Tata (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-7) Cassandra silently loses data when a single row gets large
Date Mon, 16 Mar 2009 20:01:50 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682437#action_12682437
] 

Sandeep Tata commented on CASSANDRA-7:
--------------------------------------

Another way to check that this is really a bug is to list the columns in the serialized SSTable.
You will notice a large contiguous range of missing columns. The trunk does not have a "show
SSTable" utility -- mine depends on a bunch of other code, I'll try and put one up soon.

Opening up the SSTable in a binary file viewer might be enough -- you'll see a large swath
of zeroes in the middle where real data should be.



> Cassandra silently loses data when a single row gets large
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-7
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: code in trunk, Red Hat 4.1.2-33,  Linux version 2.6.23.1-42.fc8,
java version "1.7.0-nio2"
>            Reporter: Sandeep Tata
>            Priority: Critical
>         Attachments: BigReadWriteTest.java
>
>
> When you insert a large number of columns in a single row, Cassandra silently loses some
of these inserts.
> This does not happen until the cumulative size of the columns in a single row exceeds
several megabytes.
> Say each value is 1MB large, 
> insert("row", "col0", value, timestamp)
> insert("row", "col1", value, timestamp)
> insert("row", "col2", value, timestamp)
> ...
> ...
> insert("row", "col100", value, timestamp)
> Running: 
> get_column("row", "col0")
> get_column("row", "col1")
> ...
> ..
> get_column("row", "col100")
> The sequence of get_columns will fail at some point before 100. This was a problem with
the old code in code.google also.
> I will attach a small program that will help you reproduce this. 
> 1. This only happens when the cumulative size of the row exceeds several megabytes. 
> 2. In fact, the single row should be large enough to trigger an SSTable flush to trigger
this error.
> 3. No OutOfMemory errors are thrown, there is nothing relevant in the logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message