There's a few things you can do here that might help.
First off, if you're using the default heap settings, that's a serious problem. If you've got the head room, my recommendation is to use 16GB heap with 12 GB new gen and pin your memtable heap space to 2GB. Set your max tenuring threshold to 6 and your survivor ratio to 6. You don't need a lot of old gen space with cassandra, almost everything that will show up there is memtable related, and we allocate a *lot* whenever we read data off disk.
Most folks use the default disk read ahead setting of 128KB. You can check this setting using blockdev --report, under the RA column. You'll see 256 there, that's in 512 byte sectors. MVs rely on a read before a write, so for every read off disk you do, you'll pull additional 128KB into your page cache. This is usually a waste and puts WAY too much pressure on your disk. On SSD, I always change this to 4KB.
I have some slides showing pretty good performance improvements from the above 2 changes. Specifically, I went from 16K reads a second at 180ms p99 latency up to 63K reads / second at 21ms p99. Disk usage dropped by a factor of 10. Throw in those JVM changes I recommended and things should improve even further.
Generally speaking, I recommend avoiding MVs, as they can be a giant mine if you aren't careful. They're not doing any magic behind the scenes that makes scaling easier, and in a lot of cases they're a hinderance. You still need to understand the underlying data and how it's laid out to use them properly, which is 99% of the work.