What is the best practices here to page and slice columns from a row.

So lets say I have 1,000,000 columns in a row

I read the row but want to have 1 thread read columns 0 - 9999, second thread (actor in my case) 10000 - 19999 ... and so on so i can have 100 workers processing 10,000 columns for each of my rows.

If there is no API for this then is it something I should a composite key on and have to populate the rows with a counter


Going the composite key route and doing a start/end predicate would work but then it kind of makes the insertion/load of this have to go through a single synchronized point to generate the columns names... I am not opposed to this but would prefer both the load of my data and processing of my data to not be bound by any 1 single lock (even if distributed).


Joe Stein
Twitter: @allthingshadoop