helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinayak Borkar <vbo...@yahoo.com>
Subject Re: State transitions of partitions
Date Thu, 28 Feb 2013 17:44:57 GMT
> Will this address your problem, we dont have distinct actions based on
> ERROR codes that controller will understand and take different actions.
> Were you looking for something like that ?

I will need to think more about this. I think the retry mechnism might 
be good enough for now.

> Good point on not differentiating if the partition once existed v/s newly
> created.  We actually plan to modify the drop notification
> behavior. Jason/Terence are discussing about this in another thread. Please
> add your suggestion to that thread. We should probably have a create and
> drop method(not transition) on the participants.

Currently, how do other systems that use Helix handle the bootstrapping 
process? When a resource is created for the first time, the actions of a 
participant are different as compared to other times when a resource 
partition is expanded to use another instance. Specifically, there are 
three cases that need to be handled with respect to bootstrapping:

1. A cluster is up and running, and a new resource is created and 
2. A cluster that had resources is being started after being shutdown
3. A cluster is running and a resource is already laid out on the 
cluster. Then some partitions are moved to instances that previously did 
not have any partitions of that resource.

I looked through the examples and found the ClusterMessagingService 
interface that can be used to send messages to instances in the cluster. 
I can see 3 can be handled by using the messaging infrastructure. 
However, both 1 and 2 will have the resource partitions start in the 
OFFLINE mode. The messaging API cannot help because all instances in the 
cluster are in the same boat for a particular resource in case 1 and 
case 2. So what is the preferred way to know if you are in case 1 or in 
case 2? One way I see is that if you have local artifacts matching the 
partitions that are transiting from OFFLINE -> SLAVE mode, one could 
infer it is case 2. Is that how other systems solve this issue?

On a separate note, is the messaging infrastructure general purpose? As 
in can that be used by applications to perform RPC in the cluster 
obviating the need for a separate RPC mechanism like Avro? I can see 
that the handler will need more code than one would need to write when 
using Avro to get RPC working, but my question is about the design point 
of the messaging infrastructure.


View raw message