cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5982) OutOfMemoryError when writing text blobs to a very large number of tables
Date Tue, 17 Sep 2013 17:16:54 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13769696#comment-13769696
] 

Jonathan Ellis commented on CASSANDRA-5982:
-------------------------------------------

bq. it may be better to change preExecutor -> flushwriter -> postExecuter chain with
something different that correctly handles errors thrown.

I'll just drop this for now.
                
> OutOfMemoryError when writing text blobs to a very large number of tables
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-5982
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5982
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Ryan McGuire
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.2.10, 2.0.1
>
>         Attachments: 2000CF_memtable_mem_usage.png, system.log.gz
>
>
> This test goes outside the norm for Cassandra, creating ~2000 column families, and writing
large text blobs to them. 
> The process goes like this:
> Bring up a 6 node m2.2xlarge cluster on EC2. This instance type has enough memory (34.2GB)
so that Cassandra will allocate a full 8GB heap without tuning cassandra-env.sh. However,
this instance type only has a single drive, so data and commitlog are comingled. (This test
has also been run m1.xlarge instances which have four drives (but lower memory) and has exhibited
similar results when assigning one to commitlog and 3 to datafile_directories.)
> Use the 'memtable_allocator: HeapAllocator' setting from CASSANDRA-5935.
> Create 2000 CFs:
> {code}
> CREATE KEYSPACE cf_stress WITH replication = {'class': 'SimpleStrategy', 'replication_factor':
3}
> CREATE COLUMNFAMILY cf_stress.tbl_00000 (id timeuuid PRIMARY KEY, val1 text, val2 text,
val3 text ) ;
> # repeat for tbl_00001, tbl_00002 ... tbl_02000
> {code}
> This process of creating tables takes a long time, about 5 hours, but for anyone wanting
to create that many tables, presumably they only need to do this once, so this may be acceptable.
> Write data:
> The test dataset consists of writing 100K, 1M, and 10M documents to these tables:
> {code}
> INSERT INTO {table_name} (id, val1, val2, val3) VALUES (?, ?, ?, ?)
> {code}
> With 5 threads doing these inserts across the cluster, indefinitely, randomly choosing
a table number 1-2000, the cluster eventually topples over with 'OutOfMemoryError: Java heap
space'.
> A heap dump analysis indicates that it's mostly memtables:
> !2000CF_memtable_mem_usage.png!
> Best current theory is that this is commitlog bound and that the memtables cannot flush
fast enough due to locking issues. But I'll let [~jbellis] comment more on that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message