cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yuki Morishita (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5225) Missing columns, errors when requesting specific columns from wide rows
Date Thu, 07 Feb 2013 17:25:13 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573682#comment-13573682
] 

Yuki Morishita commented on CASSANDRA-5225:
-------------------------------------------

It looks like cassandra is reading from wrong column index here(https://github.com/apache/cassandra/blob/cassandra-1.2/src/java/org/apache/cassandra/db/columniterator/SSTableNamesIterator.java#L236).

Suppose we have col indexes of [[1..5][6..10][11..15][16..20]](numbers are column names),
and we want to 'SELECT 2, 18 FROM CF';
First, we check '2' against indexes and get indexes[0]. Next, we check '18' against indexes
with lastIndexIdx of 0.
Now, because we are limiting the second index check to the sublist of indexes[0, lastIndexIdx
+ 1] here(https://github.com/apache/cassandra/blob/cassandra-1.2/src/java/org/apache/cassandra/io/sstable/IndexHelper.java#L186),
it only checks against only first two indexes and gets wrong index position of indexes[2].
So it thinks '20' is not in the sstable.

In fact, if I removed sublisting part from IndexHelper.indexFor, SSTableNamesIterator started
returning collect values. But I don't know that's the right way to do. [~slebresne]?
                
> Missing columns, errors when requesting specific columns from wide rows
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-5225
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5225
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.2.1
>            Reporter: Tyler Hobbs
>            Priority: Critical
>             Fix For: 1.2.2
>
>         Attachments: pycassa-repro.py
>
>
> With Cassandra 1.2.1 (and probably 1.2.0), I'm seeing some problems with Thrift queries
that request a set of specific column names when the row is very wide.
> To reproduce, I'm inserting 10 million columns into a single row and then randomly requesting
three columns by name in a loop.  It's common for only one or two of the three columns to
be returned.  I'm also seeing stack traces like the following in the Cassandra log:
> {noformat}
> ERROR 13:12:01,017 Exception in thread Thread[ReadStage:76,5,main]
> java.lang.RuntimeException: org.apache.cassandra.io.sstable.CorruptSSTableException:
org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid column name length
0 (/var/lib/cassandra/data/Keyspace1/CF1/Keyspace1-CF1-ib-5-Data.db, 14035168 bytes remaining)
> 	at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1576)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: org.apache.cassandra.db.ColumnSerializer$CorruptColumnException:
invalid column name length 0 (/var/lib/cassandra/data/Keyspace1/CF1/Keyspace1-CF1-ib-5-Data.db,
14035168 bytes remaining)
> 	at org.apache.cassandra.db.columniterator.SSTableNamesIterator.<init>(SSTableNamesIterator.java:69)
> 	at org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:81)
> 	at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:68)
> 	at org.apache.cassandra.db.CollationController.collectTimeOrderedData(CollationController.java:133)
> 	at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
> 	at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1358)
> 	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1215)
> 	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1127)
> 	at org.apache.cassandra.db.Table.getRow(Table.java:355)
> 	at org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:64)
> 	at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1052)
> 	at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1572)
> 	... 3 more
> {noformat}
> This doesn't seem to happen when the row is smaller, so it might have something to do
with incremental large row compaction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message