helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: HDFS read load distribution using helix
Date Mon, 19 Jun 2017 18:11:39 GMT
That should work.

On Mon, Jun 19, 2017 at 9:14 AM, Shekhar Bansal <shekhar0058@yahoo.com>
wrote:

> Thanks a lot Kishor.
> I think I can treat HDFS directory as resource and mode of filename's hash
> as tasks, is there any better way of doing it in Helix?
>
> Thanks
> Shekhar
>
>
> On Monday, June 19, 2017 8:15 PM, kishore g <g.kishore@gmail.com> wrote:
>
>
>
>    1. Currently, Helix ensures even distribution of partitions within a
>    resource, not across resources. Is it possible for you to add tasks as part
>    of the same resource?
>    2.  &3 Yes, you can start the controller as part of your process. But
>    since you said you launch this on Kubernetes every 5 minutes, I suggest
>    keeping controller and zookeeper running all the time. Controllers are
>    light weight and you can get away with a very an entry level container
>    spec. It's ok to launch Helix Participants every 5 minutes.
>
> You should consider using Helix Task Framework
> <http://helix.apache.org/0.6.7-docs/tutorial_task_framework.html>. It
> provides all the functionalities you need.
>
>
> On Mon, Jun 19, 2017 at 7:24 AM, Shekhar Bansal <shekhar0058@yahoo.com>
> wrote:
>
> I have a standalone java app(containerised), it reads data from HDFS, does
> some transformations and write data to remote storage. I want to make it
> scalable by launching multiple instances of this java app. My problem is
> how to assign tasks among these instances. can helix solve this problem?
>
> If yes, can you please help me with following
>
>    1. I referred helix quickstart example and created 1 resource per file
>    but node1 got assigned master for all resources, is it because of simple
>    StateModelDefinition used in quickstart example or I am using it wrong way
>    or is it some limitation of helix
>    2. I want to avoid running a separate controller process, so If I run
>    start controller as part of setup will helix be able to elect master
>    controller (in standalone mode), is it advisable to run tens of controllers
>    in distributed mode.
>    3. I schedule my app every five minutes using kubernetes cron, is it
>    advisable to use helix for such short lived processes
>
>
>
> Thanks
> Shekhar
>
>
>
>
>

Mime
View raw message