I was playing with a single-node Cassandra installation when discovered that a request like [SELECT COUNT(*) FROM CF] seems to load the entire dataset of CF into RAM.

This is the case (the whole CF will be loaded in memory). And it's currently a know limitation of Cassandra 1.2. This will be fix in Cassandra 2.0 but require some ground work (made in https://issues.apache.org/jira/browse/CASSANDRA-4415) that is too complex to backport in 1.2. So avoid those count queries for now unless you know the data set is small.


As far as I understand, a counting request works roughly the same way as [SELECT * FROM] with only difference that it doesn't return any data back. Is my reasoning correct?

That part is pretty much correct. If you do SELECT * FROM CF (without any WHERE clause that is), it will also load the whole CF in memory and I would bet that this OOM as well (if the count(*) OOM).
 
--
Sylvain