incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arijit Mukherjee <ariji...@gmail.com>
Subject Re: Cassandra newbie question
Date Thu, 28 Oct 2010 05:43:21 GMT
Thanx Gary.

I was thinking of using range partitioning for breaking the input.
Say, we could have different threads handling diffierent rages - (A-J)
by thread1, (K-P) by thread2. This way, there won't probably be any
chance of collision. But the thread which actually performs the
distribution could prove to be a bottleneck.

Am I correct in my thinking?

Regards
Arijit

On 27 October 2010 18:49, Gary Dusbabek <gdusbabek@gmail.com> wrote:
> On Wed, Oct 27, 2010 at 03:24, Arijit Mukherjee <arijit72@gmail.com> wrote:
>> Hi All
>>
>> I've another related question.
>>
>> I am using a stream of records of the form (A, B, n) where the pair
>> (A,B) can occur multiple times. For example, you could have the
>> following rset of records -
>>
>> A, B, 2
>> P, Q, 5
>> X, Y, 3
>> A, B, 8
>> A, B, 2
>> ...
>>
>>
>> The data store has a set of columns - (key, count, sum). Because of
>> the possibility of duplicate A and B, I am using the string A+B as my
>> key. Every time there is a duplicate A+B, I update a count field, and
>> add "n" to the existing value of sum. So, for the above set of
>> records, cassandra should actually hold the following set -
>>
>> A+B, 3, 12
>> P+Q, 1, 5
>> X+Y, 1, 3
>> ...
>
> You want a distributed counter.
>
>>
>> My question is - is it possible to have multiple threads reading
>> different streams so that I can parallelize the insertion mechanism?
>> What may happen if two threads try to insert two different records
>> with the same A+B key?
>>
>
> No, this isn't going to work.  At some point Cassandra will have
> distributed counters, probably with a few caveats.  See
> https://issues.apache.org/jira/browse/CASSANDRA-1546 and related
> tickets for more information.
>
> The best approach I can suggest at this point is to continue inserting
> the increments as column names and then manually sum them up when you
> need to.  There are several approaches you could take if you're
> interested in consolidating slices of the increments that would be
> reasonably safe against the possibility of concurrent updates.
>
> Gary.
>



-- 
"And when the night is cloudy,
There is still a light that shines on me,
Shine on until tomorrow, let it be."

Mime
View raw message