In the case without CQL3, where I would use composite columns, I see how this sort of lines up with what CQL3 is doing.
I don't have the ability to use CQL3 as I am using pycassa for my client, so that leaves me with CompositeColumns
Under composite columns, I would have 1 row, which would be stored on 1 node with a lot of columns. Basically this single node would be hit frequently and the other nodes would be ignored. Assuming I have it correct that a row lives on a single node.
I can then get a slice of columns using the composite though (username,), and have the comparator be reverse for the photo_seq which would give me my proper order. As I understand it, that would give me the same data result as using the primary key, but it would be looking at 1 row on 1 node, unlike the PK solution, so I would have a hotspot.
The PRIMARY KEY solution allows creates multiple rows, but they effectively act as 1 wide row, but they have the benefit of being distributed across the nodes as they are independent rows (using username as the partition key), instead of living on 1 node in 1 row.
If my assumptions are correct above, the PK solution is clearly better than the single row solution. In doing some reading, I have come across a solution where you manually partition the row keys so you spread the load more evenly. The cassandra docs here talk about this approach under "High Throughput Timelines" : http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra
Would you advise the manual partition example?
My other option is to store all of the photos based on their id, or generate them my own canonical id based on their id and some other factors, into rows and then have kind of a hybrid index row for usernames that not only would reference the photo_id row but potentially contain some more information to render a result set.