incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arijit Mukherjee <>
Subject Re: Cassandra newbie question
Date Wed, 27 Oct 2010 08:24:20 GMT
Hi All

I've another related question.

I am using a stream of records of the form (A, B, n) where the pair
(A,B) can occur multiple times. For example, you could have the
following rset of records -

A, B, 2
P, Q, 5
X, Y, 3
A, B, 8
A, B, 2

The data store has a set of columns - (key, count, sum). Because of
the possibility of duplicate A and B, I am using the string A+B as my
key. Every time there is a duplicate A+B, I update a count field, and
add "n" to the existing value of sum. So, for the above set of
records, cassandra should actually hold the following set -

A+B, 3, 12
P+Q, 1, 5
X+Y, 1, 3

My question is - is it possible to have multiple threads reading
different streams so that I can parallelize the insertion mechanism?
What may happen if two threads try to insert two different records
with the same A+B key?


On 11 October 2010 18:32, Gary Dusbabek <> wrote:
> On Mon, Oct 11, 2010 at 04:01, Arijit Mukherjee <> wrote:
>> Hi All
>> I've just started reading about Cassandra and writing simple tests
>> using Cassandra 0.6.5 to see if we can use it for our product.
>> I have a data store with a set of columns, like C1, C2, C3, and C4,
>> but the columns aren't mandatory. For example, there can be a list of
>> (k.v) pairs with only C1 and C2, but no C3 and C4. At the same time,
>> there can be a set of records with all the columns present. It's
>> possible to consider them as three sets A (with all columns), B (with
>> C1 and C2) and C (with C3 and C4). And I'm trying to find out the
>> following:
>> 1. A - B (all records who don't have C3 and C4) and A - C (all record
>> who don't have C1 and C2)
>> 2. records for whom C2 != C4
>> It's possible to pick all records and do this processing in my client
>> code - but that won't perform well. Is there any way to do these
>> within Cassandra? For example, by passing a list of column names so
>> that cassandra returns the records with only those columns?
> multiget_slice with the SlicePredicate specified using column_names
> can do the lookups.  As far as doing the set operations: no, Cassandra
> doesn't have the ability to do this server-side.
> Gary.
>> Regards
>> Arijit

"And when the night is cloudy,
There is still a light that shines on me,
Shine on until tomorrow, let it be."

View raw message