cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-3861) get_indexed_slices throws OOM Error when is called with too big indexClause.count
Date Wed, 08 Feb 2012 00:04:59 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203001#comment-13203001
] 

Jonathan Ellis commented on CASSANDRA-3861:
-------------------------------------------

bq. it feels weird to arbitrary force people to pretty much always implements paging

But we already are, and we have to until we have our hypothetical streaming query protocol.
 The problem is that our API mixes "query" and "fetch" at the server level.  RDBMS drivers
solved this with the "cursor" abstraction.  So "SELECT * FROM very_large_table" won't OOM
you, but cursor.fetchall() on that query will likely OOM your *client*.

So given that we query + fetch together, increasing count to MAX_VALUE is insane.  (And this
has to be done explicitly--the default is 100.)

In your example above, the "right" thing to do from a client's perspective is to use a limit
of 10000.

That said, I agree that it's inefficient to always allocate the limit.  I guess I'd be okay
with dropping that if we add a special check to return IRE for the MAX_VALUE antipattern.
                
> get_indexed_slices throws OOM Error when is called with too big indexClause.count
> ---------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3861
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3861
>             Project: Cassandra
>          Issue Type: Bug
>          Components: API, Core
>    Affects Versions: 1.0.7
>            Reporter: Vladimir Tsanev
>            Assignee: Sylvain Lebresne
>             Fix For: 1.0.8
>
>         Attachments: 3861.patch
>
>
> I tried to call get_index_slices with Integer.MAX_VALUE as IndexClause.count. Unfortunately
the node died with OOM. In the log there si following error:
> ERROR [Thrift:4] 2012-02-06 17:43:39,224 Cassandra.java (line 3252) Internal error processing
get_indexed_slices
> java.lang.OutOfMemoryError: Java heap space
> 	at java.util.ArrayList.<init>(ArrayList.java:112)
> 	at org.apache.cassandra.service.StorageProxy.scan(StorageProxy.java:1067)
> 	at org.apache.cassandra.thrift.CassandraServer.get_indexed_slices(CassandraServer.java:746)
> 	at org.apache.cassandra.thrift.Cassandra$Processor$get_indexed_slices.process(Cassandra.java:3244)
> 	at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
> 	at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:662)
> Is it necessary to allocate all the memory in advance. I only have 3 KEYS that match
my caluse. I do not known the exact number but in general I am sure that they wil fit in the
memory.
> I can/will implement some calls with paging, but wanted to test and I am not happy with
the fact the node disconnected.
> I wonder why ArrayList is used here?
> I think the result is never accessed by index (but only iterated) and the subList for
non RandomAccess Lists (for example LinkedList) will do the same job if you are not using
other operations than iteration.
> Is this related to the problem described in CASSANDRA-691.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message