I need to limit a MapReduce job to only scan a specific range of columns. The CF being processed is a wide row, so I've set the 'widerow' property in ConfigHelper.setInputColumnFamily() to true.
However, in the word_count example on github, the following comment exists:
// this will cause the predicate to be ignored in favor of scanning everything as a wide row
ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE, COLUMN_FAMILY, true);
This suggests that ignoring the SlicePredicate for wide rows is by design - and this is certainly the behavior I've been witnessing. In which case, how do I limit the columns being scanned?
N.B. I cant set the 'widerow' flag to false as it breaks Cassandra (too many columns are loaded at once, causing an outofmemory style exception).