If this is write once read many data you may get some benefit from packing all the info for a product into one column, using something like JSON for the column value. 

The one thing that stands out to me with this approach is the number of additonal columns that will be created for a single key. Will the increase in columns, create new issues I will need to deal with?
Millions of columns in a row may be ok, depending on the types of queries you want to run (some background http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/

The more important issue is the byte size of the row. Wide rows take longer to compact and repair, and I try to avoid rows above a few 10's of MB. By default rows larger than 64MB require slower compaction. 

Compression in 1.X will help where you have lots of repeating column names. 

Cheers


-----------------
Aaron Morton
Freelance Developer
@aaronmorton

On 13/04/2012, at 7:32 AM, Dave Brosius wrote:

If you want to reduce the number of columns, you could pack all the data for a product into one column, as in


composite column name-> product_id_1:12.44:1.00:3.00



On 04/12/2012 03:03 PM, Philip Shon wrote:
I am currently working on a data model where the purpose is to look up multiple products for given days of the year.  Right now, that model involves the usage of a super column family. e.g.

"2012-04-12": {
 "product_id_1": {
   price: 12.44,
   tax: 1.00,
   fees: 3.00,
 },
 "product_id_2": {
   price: 50.00,
   tax: 4.00,
   fees: 10.00
 }
}

I should note that for a given day/key, we are expecting in the range of 2 million to 4 million products (subcolumns).

With this model, I am able to retrieve any of the products for a given day using hector's MultigetSuperSliceQuery.


I am looking into changing this model to use Composite column names. How would I go about modeling this? My initial thought is to migrate the above model into something more like the following.

"2012-04-12": {
 "product_id_1:price": 12.44,
 "product_id_1:tax": 1.00,
 "product_id_1:fees": 3.00,
 "product_id_2:price": 50.00,
 "product_id_2:tax": 4.00,
 "product_id_2:fees": 10.00,
}

The one thing that stands out to me with this approach is the number of additonal columns that will be created for a single key. Will the increase in columns, create new issues I will need to deal with?

Are there any other thoughts about if I should actually move forward (or not) with migration this super column family to the model with the component column names?

Thanks,

Phil