cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aklin_81 <>
Subject Re: Finding the intersection results of column sets of two rows
Date Mon, 07 Feb 2011 05:30:50 GMT
Thanks Aaron & Shaun,

I think my question might have been unclear to some of you. So I would
again explain my problem(& solution which I thought of) for the sake
of clarity:-

Consider I have 2 rows.  1st row contains 60-70 columns and 2nd row
contains like in hundreds of thousands columns. Both the columns sets
are all valueless. I need to just findout the **common column names**
in the two rows. **These two rows are known to me**. So what I plan to
do is, I just pick up all **columns (names)** of 1st row (60 -70
columns) and just ask for them in 2nd row, whatever column names I get
back is my result.
Would there be any problem with this solution ? This is how I am
expecting to get common column names.

Please do not consider it as a JOIN case as it leads to unnecessary
confusions, I just need common column names from valueless columns in
the two rows.


Aaron, actually the intersection data is very much context based. So
say if there are 10 million rows in CF A & 1 million in CF B, then
intersection data would be containing 10 million *1 million rows. This
would involve very huge & unaffordable amounts of denormalization.
And finding columns in client would require pulling unnecessary
columns like pulling 100,000 columns from a row of which only 60-70
are required .

Shaun, I hope my above clarification has clarified things a bit. Yes,
the rows, of which I need to find common columns are known to me.

Thank you all,

On Mon, Feb 7, 2011 at 3:53 AM, Shaun Cutts <> wrote:
> In theory, you should be able to do joins by creating an extra column in one column family,
holding the "foreign key" of the matching row in the other family.
> This assumes that the info you are joining on is available in both CFs (is not some sort
of functional transformation).
> I have just found that the implementation for secondary indexes is not yet very close
to optimal for more complex "joins" involving multiple indexes, I'm not sure if that affects
you as you didn't say what you are joining on.
> -- Shaun
> On Feb 6, 2011, at 4:22 PM, Aaron Morton wrote:
>> Is it possible for you to dernormalise and write all the intersection values? Will
depend on how many I guess.
>> The other alternative is to pull back more data that you need and the intersection
in code in the client.
>> Hope that helps.
>> Aaron
>> On 7/02/2011, at 7:11 AM, Aklin_81 <> wrote:
>>> Hi,
>>> @buddhasystem : yes that's well known solution. But obviously when
>>> mysql couldnt satisfy my needs, I am here. My question is in context
>>> of Cassandra, if it possible to achieve intersection result set of
>>> columns in two rows, by the way I spoke about.
>>> @Edward: yes that I know but how does that fit here for obtaining the
>>> common columns among two rows.
>>> Thanks for your comments..
>>> -Asil
>>> On Sun, Feb 6, 2011 at 9:55 PM, Edward Capriolo <>
>>>> On Sun, Feb 6, 2011 at 10:15 AM, buddhasystem <> wrote:
>>>>> Hello,
>>>>> If the amount of data is _that_ small, you'll have a much easier life
>>>>> MySQL, which supports the "join" procedure -- because that's exactly
>>>>> you want to achieve.
>>>>> asil klin wrote:
>>>>>> Hi all,
>>>>>> I want to procure the intersection of columns set of two rows (from
>>>>>> different column families).
>>>>>> To achieve the intersection results, Can I, first retrieve all
>>>>>> columns(around 300) from first row and just query by those column
>>>>>> names in the second row(which contains maximum 100 000 columns) ?
>>>>>> I am using the results during the write time & not before presentation
>>>>>> to the user, so latency wont be much concern while writing.
>>>>>> Is it the proper way to procure intersection results of two rows
>>>>>> Would love to hear your comments..
>>>>>> ---------
>>>>>> Regards,
>>>>>> Asil
>>>>> --
>>>>> View this message in context:
>>>>> Sent from the mailing list archive
>>>> You can use multi-get when fetching lists of already know keys
>>>> optimize your round rip time.

View raw message