Hi, Brian --
A little late to reply, but I'm slowly catching up.
You're going to be better off, IMHO, to pull the data out of Cassandra with a tool like Pig (probably with a bit of aggregation and filtering) and then operate on it in R as a static delimited file. If you need additional automation or batching (as well as cleaning and aggregation), you can automate that using various tools. Some of this depends on your modeling workflow, but it's not unreasonable to expect that you'll want to return to exactly the same dataset and repeat some processes as you refine your approach. It's difficult/impossible to do that against live data.