incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anton Brazhnyk <anton.brazh...@genesys.com>
Subject Cassandra token range support for Hadoop (ColumnFamilyInputFormat)
Date Tue, 13 May 2014 22:46:20 GMT
Greetings,

I'm reading data from C* with Spark (via ColumnFamilyInputFormat) and I'd like to read just
part of it - something like Spark's sample() function.
Cassandra's API seems allow to do it with its ConfigHelper.setInputRange(jobConfiguration,
startToken, endToken) method, but it doesn't work.
The limit is just ignored and the entire column family is scanned. It seems this kind of feature
is just not supported 
and sources of AbstractColumnFamilyInputFormat.getSplits confirm that (IMO).
Questions:
1. Am I right that there is no way to get some data limited by token range with ColumnFamilyInputFormat?
2. Is there other way to limit the amount of data read from Cassandra with Spark and ColumnFamilyInputFormat,
so that this amount is predictable (like 5% of entire dataset)?


WBR,
Anton



Mime
View raw message