incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DuyHai Doan <doanduy...@gmail.com>
Subject Re: Using C* and CAS to coordinate workers
Date Fri, 04 Apr 2014 11:58:51 GMT
@Jan

 This subject of distributed workers & queues has been discussed in the
mailing list many times. Basically one implementation can be:

1) *p* data providers, *c* data consumers
2) create partitions (physical rows) of arbitrary number of columns (let's
say 10 000, not too big though). Partition key = bucket number (*#b*)
3) assign an integer id (*pId*) to each provider, same for each consumer (
*cId*)
4) each provider can only write messages in bucket number such that *#b mod
p = pId mod p*
5) once the provider reaches 10 000 messages per bucket, it switches to the
next one with *new #b = old #b + p*
6) the consumers follow the same rule for bucket switching

Example:

 p = 5, c = 3

 - p1 writes messages into buckets {1,6,11,16...} // 1, 1+5, 1+5+5, ....
 - p2 writes messages into buckets {2,7,12,17...} // 2, 2+5, 2+5+5,...
 - p3 writes messages into buckets {3,8,13,18...}
 - p4 writes messages into buckets {4,9,14,19...}
 - p5 writes messages into buckets {5,10,15,20...}

 - c1 consumes messages from buckets {1,4,7,10...} // 1, 1+3, 1+3+3...
 - c2 consumes messages from buckets {2,5,8,11...}
 - c1 consumes messages from buckets {3,6,9,12...}

Of course, consumers can not re-put messages into the bucket otherwise the
counting (10 000 elements/bucket) is screwed

Alternatively, you can insert messages with TTL to automatically expired
"consumed buckets" after a while, saving you the hassle to clean up old
buckets to reclaim disk space.


 There are other implementations based on distributed lock using C* C.A.S
also but the above algorithm do not requires any lock.

Regards

 Duy Hai DOAN





On Fri, Apr 4, 2014 at 12:47 PM, prem yadav <ipremyadav@gmail.com> wrote:

> Oh ok. I thought you did not have a cassandra cluster already. Sorry about
> that.
>
>
> On Fri, Apr 4, 2014 at 11:42 AM, Jan Algermissen <
> jan.algermissen@nordsc.com> wrote:
>
>>
>> On 04 Apr 2014, at 11:18, prem yadav <ipremyadav@gmail.com> wrote:
>>
>> Though cassandra can work but to me it looks like you could use a
>> persistent queue for example (rabbitMQ) to implement this. All your workers
>> can subscribe to a queue.
>> In fact, why not just MySQL?
>>
>>
>> Hey, I have got a C* cluster that can (potentially) do CAS.
>>
>> Why would I set up a MySQL cluster to solve that problem?
>>
>> And yeah, I could use a queue or redis or whatnot, but I want to avoid
>> yet another moving part :-)
>>
>> Jan
>>
>>
>>
>>
>> On Thu, Apr 3, 2014 at 11:44 PM, Jan Algermissen <
>> jan.algermissen@nordsc.com> wrote:
>>
>>> Hi,
>>>
>>> maybe someone knows a nice solution to the following problem:
>>>
>>> I have N worker processes that are intentionally masterless and do not
>>> know about each other - they are stateless and independent instances of a
>>> given service system.
>>>
>>> These workers need to poll an event feed, say about every 10 seconds and
>>> persist a state after processing the polled events so the next worker knows
>>> where to continue processing events.
>>>
>>> I would like to use C*'s CAS feature to coordinate the workers and
>>> protect the shared state (a row or cell in  a C* key space, too).
>>>
>>> Has anybody done something similar and can suggest a 'clever' data model
>>> design and interaction?
>>>
>>>
>>>
>>> Jan
>>
>>
>>
>>
>

Mime
View raw message