incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Algermissen <jan.algermis...@nordsc.com>
Subject Re: Using C* and CAS to coordinate workers
Date Fri, 04 Apr 2014 19:50:27 GMT
Hi Duy Hai,


On 04 Apr 2014, at 20:48, DuyHai Doan <doanduyhai@gmail.com> wrote:

> @Jan
> 
>  Your use-case is different than what i though. So basically you have only one data source
(the feed) and many consumers (the workers)
> 
>  Only one worker is allowed to consumer the feed at a time.
> 
>  This can be modeled very easily using distributed lock with C.A.S

thank you - this confirms a design I am using in a similar scenario. However, I have problems
there when concurrency is getting high - maybe my C* version is a too-early version 2. But
since you confirm the design, I’ll dig deeper into the issue. Also, I’ll try to spread
out the workers using modulo calculations somehow.

Having said that - my impression was that what you suggest is sort of 'brute force coordination’
and that there is likely a more clever design - like your queuing example. That is why I went
to the list for answers.

Thank you either way!

Jan



> 
> CREATE TABLE feed_lock (
>     lock text PRIMARY KEY,
>     worker_token text
> ); 
> 
> 1) First initialize the table with : INSERT INTO feed_lock (lock) VALUES('lock'). The
primary key value is hard-coded and is always the same: 'lock'. It does not really matter.
At this step, the column worker_token is null since we did not insert any value in it
> 
> 2) If a worker w1 wants to get the lock: UPDATE feed_lock SET worker_token='token1' WHERE
lock='lock' IF worker_token=null.
> w1 tries to acquire the lock if it is not hold by any other worker (IF worker_token=null)
> 
> 3) Concurrently, if another worker w2 tries to acquire the lock, it will fail since worker_token
is not null any more. 
> 
> 4) w1 will release the lock with: UPDATE feed_lock SET worker_token=null WHERE lock='lock'
IF worker_token='token1'
> The important detail here is that only w1 knows the value of token1 so only w1 is able
to release the lock
> 
> This is a very simple design.
> 
>  Now it can happen that the lock is never released because one worker holding it crashed
and the secret token is lost. To circumvent that, you can held the lock using an update with
TTL. The lock will be released no matter how after the TTL expires. If the processing of the
feed takes longer than the TTL time, the current worker can always extends the lease with
another update with TTL again to reset the TTL value.
> 
>  Hope that helps
> 
>  Regards
> 
>  Duy Hai DOAN
> 
> 
> 
>   
> 
> 
> On Fri, Apr 4, 2014 at 6:48 PM, Jan Algermissen <jan.algermissen@nordsc.com> wrote:
> Hi DuyHai,
> 
> 
> On 04 Apr 2014, at 13:58, DuyHai Doan <doanduyhai@gmail.com> wrote:
> 
> > @Jan
> >
> >  This subject of distributed workers & queues has been discussed in the mailing
list many times.
> 
> Sorry + thanks.
> 
> Unfortunately, I do not want to use C* as a queue, but to coordinate workers that page
through an (XML) data feed of events every N seconds.
> 
> Let me try again:
> 
> - I have N instances of the same system, replicated to ensure work is
>   being done despite failure of instances
> - the instances are master less and know nothing about each other. Given
>   them an integer ID isn’r really possible and the number of instances isn’t
>   really known
> - there is a schedule, controlling how often the feed is read, say once every Minute
> - the schedule might change by way of an administrator of the ‘feed polling’
> - the worker instances check for work every, e.g. 10 secs
> - once a worker starts, it checks whether there is work to do (the schedule aspect) and
if so,
>   starts polling the feed until the last event has been reached.
> - During that time, no other worker must poll the feed
> - once the working worker is done it saves the timestamp or ID of the last seen event
and sets the next schedule
> 
> - the processing of the events might take much longer than the schedule intervals
> 
> I hope this explains more, what I am up to. Maybe I can adapt your suggestion, I just
do not see how.
> 
> Jan
> 
> 
> 
> > Basically one implementation can be:
> >
> > 1) p data providers, c data consumers
> > 2) create partitions (physical rows) of arbitrary number of columns (let's say 10
000, not too big though). Partition key = bucket number (#b)
> > 3) assign an integer id (pId) to each provider, same for each consumer (cId)
> > 4) each provider can only write messages in bucket number such that #b mod p = pId
mod p
> > 5) once the provider reaches 10 000 messages per bucket, it switches to the next
one with new #b = old #b + p
> > 6) the consumers follow the same rule for bucket switching
> >
> > Example:
> >
> >  p = 5, c = 3
> >
> >  - p1 writes messages into buckets {1,6,11,16...} // 1, 1+5, 1+5+5, ....
> >  - p2 writes messages into buckets {2,7,12,17...} // 2, 2+5, 2+5+5,...
> >  - p3 writes messages into buckets {3,8,13,18...}
> >  - p4 writes messages into buckets {4,9,14,19...}
> >  - p5 writes messages into buckets {5,10,15,20...}
> >
> >  - c1 consumes messages from buckets {1,4,7,10...} // 1, 1+3, 1+3+3...
> >  - c2 consumes messages from buckets {2,5,8,11...}
> >  - c1 consumes messages from buckets {3,6,9,12...}
> >
> > Of course, consumers can not re-put messages into the bucket otherwise the counting
(10 000 elements/bucket) is screwed
> >
> > Alternatively, you can insert messages with TTL to automatically expired "consumed
buckets" after a while, saving you the hassle to clean up old buckets to reclaim disk space.
> >
> >
> >  There are other implementations based on distributed lock using C* C.A.S also but
the above algorithm do not requires any lock.
> >
> > Regards
> >
> >  Duy Hai DOAN
> >
> >
> >
> >
> >
> > On Fri, Apr 4, 2014 at 12:47 PM, prem yadav <ipremyadav@gmail.com> wrote:
> > Oh ok. I thought you did not have a cassandra cluster already. Sorry about that.
> >
> >
> > On Fri, Apr 4, 2014 at 11:42 AM, Jan Algermissen <jan.algermissen@nordsc.com>
wrote:
> >
> > On 04 Apr 2014, at 11:18, prem yadav <ipremyadav@gmail.com> wrote:
> >
> >> Though cassandra can work but to me it looks like you could use a persistent
queue for example (rabbitMQ) to implement this. All your workers can subscribe to a queue.
> >> In fact, why not just MySQL?
> >
> > Hey, I have got a C* cluster that can (potentially) do CAS.
> >
> > Why would I set up a MySQL cluster to solve that problem?
> >
> > And yeah, I could use a queue or redis or whatnot, but I want to avoid yet another
moving part :-)
> >
> > Jan
> >
> >
> >>
> >>
> >> On Thu, Apr 3, 2014 at 11:44 PM, Jan Algermissen <jan.algermissen@nordsc.com>
wrote:
> >> Hi,
> >>
> >> maybe someone knows a nice solution to the following problem:
> >>
> >> I have N worker processes that are intentionally masterless and do not know
about each other - they are stateless and independent instances of a given service system.
> >>
> >> These workers need to poll an event feed, say about every 10 seconds and persist
a state after processing the polled events so the next worker knows where to continue processing
events.
> >>
> >> I would like to use C*’s CAS feature to coordinate the workers and protect
the shared state (a row or cell in  a C* key space, too).
> >>
> >> Has anybody done something similar and can suggest a ‘clever’ data model
design and interaction?
> >>
> >>
> >>
> >> Jan
> >>
> >
> >
> >
> 
> 


Mime
View raw message