cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-208) jvm crashes intermittently during compaction
Date Tue, 02 Jun 2009 02:07:07 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715334#action_12715334
] 

Jonathan Ellis commented on CASSANDRA-208:
------------------------------------------

Another related link: http://wiki.apache.org/hadoop/Hbase/NewFileFormat

My take is that the designs are different enough that their reasons for moving to a single
file don't really apply to cassandra.

 - the old MapFile has a bunch of properties that make it general enough for Hadoop core but
inefficient for hbase (e.g. storing the CF name once per key, keys appearing multiple times
in the index)
 - they only index one key per block, so their index is much much smaller than ours, and they
can get away with storing the index at the end of the file as cassandra currently does

> Even if we take out the row index and BF, data is still mixed with column index. 

Not at the SSTable key/value level.  To sstable the value is just byte[] so the fact that
CF serializes with indexes is an implementation detail.  (To the degree that SSTable or SF
does care, that is an encapsulation violation -- one of the reasons this code is one of the
less pleasant parts of cassandra to work in.)

I will get a patch together that will implement the file splitting I proposed and we will
see how that looks.  I think that's going to get us to a stable 0.3 fastest; if we want to
radically re-think how indexing works (so we can go back to index-at-the-end-of-one-file)
then I think that is a change to make in 0.4.  (The non-sparse index Cassandra uses may be
necessary if you want to support large CF rows, or you will waste too much time scanning through
those rows looking for keys when you only get to within 128 keys from the index.)

One thing that piqued my curiosity: what is the hbase "row index?"  Looks like their "key
index" is like our sstable indexes (with the difference mentioned above, that it only indexes
one key per block).

> jvm crashes intermittently during compaction
> --------------------------------------------
>
>                 Key: CASSANDRA-208
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-208
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: trunk
>         Environment: arch: x86_64
> os: Linux version 2.6.18-92.1.22.el5 
> java: nio2-ea-bin-b99-linux-x64-05_feb_2009
>            Reporter: Jiansheng Huang
>            Assignee: Jonathan Ellis
>            Priority: Critical
>             Fix For: 0.3
>
>
> jvm crashes intermittently during compaction. Our test data set is not that big, less
than 10 GB.
> When jvm is about to crash, we see that it consumes a lot of memory (exceeding the max
heap size).
> The excessive memory usage during compaction is caused by the maintenance of blockIndexes_
in SSTable. this blockIndexes_ was only introduced to the apache version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message