cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiller, Dean" <>
Subject Re: 1000's of column families
Date Tue, 02 Oct 2012 17:52:23 GMT
Because the data for an index is not all together(ie. Need a multi get to get the data).  It
is not contiguous.

The prefix in a partition they keep the data so all data for a prefix from what I understand
is contiguous.

QUESTION: What I don't get in the comment is I assume you are referring to CQL in which case
we would need to specify the partition (in addition to the index)which means all that data
is on one node, correct?  Or did I miss something there.


From: Ben Hood <<>>
Reply-To: "<>" <<>>
Date: Tuesday, October 2, 2012 11:18 AM
To: "<>" <<>>
Subject: Re: 1000's of column families


On Tuesday, October 2, 2012 at 17:06, Jeremy Hanna wrote:

Another option that may or may not work for you is the support in Cassandra 1.1+ to use a
secondary index as an input to your mapreduce job. What you might do is add a field to the
column family that represents which virtual column family that it is part of. Then when doing
mapreduce jobs, you could use that field as the secondary index limiter. Secondary index mapreduce
is not as efficient since you first get all of the keys and then do multigets to get the data
that you need for the mapreduce job. However, it's another option for not scanning the whole
column family.

Interesting. This is probably a stupid question but why shouldn't you be able to use the secondary
index to go straight to the slices that belong to the attribute you are searching by? Is this
something to do with the way Cassandra is exposed as an InputFormat for Hadoop or is this
a general property for searching by secondary index?


View raw message