incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Hood <>
Subject Re: 1000's of column families
Date Tue, 02 Oct 2012 19:01:40 GMT

On Tuesday, October 2, 2012 at 18:52, Hiller, Dean wrote:

> Because the data for an index is not all together(ie. Need a multi get to get the data).
It is not contiguous.
> The prefix in a partition they keep the data so all data for a prefix from what I understand
is contiguous.

So you're saying that you can access the primary index with a key range, but to access the
secondary index, you first need to get all keys and follow up with a multiget, which would
use the secondary index to speed the lookup of the matching rows?

> QUESTION: What I don't get in the comment is I assume you are referring to CQL in which
case we would need to specify the partition (in addition to the index)which means all that
data is on one node, correct? Or did I miss something there.

Maybe my question was just silly - I wasn't referring to CQL.

As for the locality of the data, I was hoping to be able to fire off an MR job to process
all matching rows in the CF - I was assuming that that this job would get executed on the
same node as the data.

But I think the real confusion in my question has to do with the way the ColumnFamilyInputFormat
has been implemented, since it would appear that it ingests the entire (non-OPP) CF into Hadoop,
such that the predicate needs to be applied in the job rather than up front in the Cassandra



View raw message