helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shekhar Bansal <shekhar0...@yahoo.com>
Subject Re: HDFS read load distribution using helix
Date Mon, 19 Jun 2017 16:14:12 GMT
Thanks a lot Kishor.I think I can treat HDFS directory as resource and mode of filename's hash
as tasks, is there any better way of doing it in Helix?

    On Monday, June 19, 2017 8:15 PM, kishore g <g.kishore@gmail.com> wrote:

   - Currently, Helix ensures even distribution of partitions within a resource, not across
resources. Is it possible for you to add tasks as part of the same resource?
   -  &3 Yes, you can start the controller as part of your process. But since you said
you launch this on Kubernetes every 5 minutes, I suggest keeping controller and zookeeper
running all the time. Controllers are light weight and you can get away with a very an entry
level container spec. It's ok to launch Helix Participants every 5 minutes.
You should consider using Helix Task Framework. It provides all the functionalities you need.

On Mon, Jun 19, 2017 at 7:24 AM, Shekhar Bansal <shekhar0058@yahoo.com> wrote:

I have a standalone java app(containerised), it reads data from HDFS, does some transformations
and write data to remote storage. I want to make it scalable by launching multiple instances
of this java app. My problem is how to assign tasks among these instances. can helix solve
this problem?
If yes, can you please help me with following    
   - I referred helix quickstart example and created 1 resource per file but node1 got assigned
master for all resources, is it because of simple StateModelDefinition used in quickstart
example or I am using it wrong way or is it some limitation of helix   

   - I want to avoid running a separate controller process, so If I run start controller as
part of setup will helix be able to elect master controller (in standalone mode), is it advisable
to run tens of controllers in distributed mode.   

   - I schedule my app every five minutes using kubernetes cron, is it advisable to use helix
for such short lived processes


View raw message