cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheng Ren (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-8518) Cassandra Query Request Size Estimator
Date Fri, 19 Dec 2014 05:51:13 GMT
Cheng Ren created CASSANDRA-8518:
------------------------------------

             Summary: Cassandra Query Request Size Estimator
                 Key: CASSANDRA-8518
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8518
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
            Reporter: Cheng Ren


We have been suffering from cassandra node crash due to out of memory for a long time. The
heap dump from the recent crash shows there are 22 native transport request threads of queries
each of which consumes 3.3% of heap size, taking more than 70% in total.  
Heap dump:
!https://dl-web.dropbox.com/get/attach1.png?_subject_uid=303980955&w=AAAVOoncBoZ5aOPbDg2TpRkUss7B-2wlrnhUAv19b27OUA|height=400,width=600!
Expanded view of one thread:
!https://dl-web.dropbox.com/get/Screen%20Shot%202014-12-18%20at%204.06.29%20PM.png?_subject_uid=303980955&w=AACUO4wrbxheRUxv8fwQ9P52T6gBOm5_g9zeIe8odu3V3w|height=400,width=600!

The cassandra we are using now (2.0.4) utilized MemoryAwareThreadPoolExecutor as the request
executor and provided a default request size estimator which constantly returns 1, meaning
it limits only the number of requests being pushed to the pool. To have more fine-grained
control on handling requests and better protect our node from OOM issue, we propose implementing
a more precise estimator. 

Here is our two cents:
For update/delete/insert request: Size could be estimated by adding size of all class members
together.

For scan query, the major part of the request is response, which can be estimated from the
history data. For example if we receive a scan query on a column family for a certain token
range, we keep track of its response size used as the estimated response size for later scan
query on the same cf. 
For future requests on the same cf, response size could be calculated by token range*recorded
size/ recorded token range. The request size should be estimated as (query size + estimated
response size).

We believe what we're proposing here can be useful for other people in the Cassandra community
as well. Would you mind providing us feedbacks? Please let us know if you have any concerns
or suggestions regarding this proposal.

Thanks,
Cheng




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message