helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: Modeling a pipeline in Helix
Date Sun, 02 Nov 2014 04:52:01 GMT
Hi Matt,

When you say scale particular pipeline up/down by adding replicas, do you
mean each replica will process different set of items? If that is the case
then you probably want to increase the number of partitions and not the

I am assuming these stages are long running. Is this assumption correct?

What do you want to happen when a node/process fails?

Kishore G

On Sat, Nov 1, 2014 at 5:39 PM, matt hoffman <matthoffman@acm.org> wrote:

> I'm doing a quick POC to write an event processing pipeline using Helix,
> and I had a couple questions about the best way to model it.
> For simplicity, these pipelines are made up of a linear set of stages, and
> an item flows through each stage in order.
> I'd like to be able to scale a particular pipeline up and down (3
> instances of Pipeline A on the cluster, for example).  That seems
> straightforward to do in Helix -- it's just changing the number of replicas
> of a resource.
> Sometimes particular stages have to run on a particular set of machines
> (say, a stage that requires a GPU that is only on some machines in the
> cluster, for example).  It looks like I would do that in Helix using a
> SEMI_AUTO rebalancing mode.
> For efficiency, I'd like Helix to try to group as many stages of the same
> pipeline onto the same machine as possible.  I don't want it to spread the
> stages across the cluster so that a single item has to hop from machine to
> machine any more than necessary.  I'm not sure how best to model this in
> Helix.
> In Helix's world, does it make sense to model stages as "partitions"
> within a pipeline "resource"?  Or should the stages be resources
> themselves?   And if they are resources, can I define a constraint or
> rebalancing algorithm that attempts to colocate them?   I see that the task
> execution recipe is pretty similar to what I want, and it models the
> individual stages as resources... so I'm guessing that's the best way to
> model it, but I don't know enough about Helix yet to know the pros and cons.
> I'm also assuming that the new "task" abstraction in Helix 0.7 probably
> isn't what I want; it seems to be modeling something with a discrete
> execution like a MapReduce mapper, as opposed to a stage that items flow
> through.  Am I correct?
> Thanks for any advice you can give!
> matt

View raw message