helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shirshanka Das <shirsha...@gmail.com>
Subject Re: HDFS read load distribution using helix
Date Mon, 19 Jun 2017 19:29:35 GMT
You might also want to look at Gobblin which uses Helix in a very similar way and is actually
used to read data from HDFS, do transformations and load into remote store. 

Shirshanka


> On Jun 19, 2017, at 11:11 AM, kishore g <g.kishore@gmail.com> wrote:
> 
> That should work.
> 
>> On Mon, Jun 19, 2017 at 9:14 AM, Shekhar Bansal <shekhar0058@yahoo.com> wrote:
>> Thanks a lot Kishor.
>> I think I can treat HDFS directory as resource and mode of filename's hash as tasks,
is there any better way of doing it in Helix?
>> 
>> Thanks
>> Shekhar
>> 
>> 
>> On Monday, June 19, 2017 8:15 PM, kishore g <g.kishore@gmail.com> wrote:
>> 
>> 
>> Currently, Helix ensures even distribution of partitions within a resource, not across
resources. Is it possible for you to add tasks as part of the same resource?
>>  &3 Yes, you can start the controller as part of your process. But since you
said you launch this on Kubernetes every 5 minutes, I suggest keeping controller and zookeeper
running all the time. Controllers are light weight and you can get away with a very an entry
level container spec. It's ok to launch Helix Participants every 5 minutes.
>> You should consider using Helix Task Framework. It provides all the functionalities
you need.
>> 
>> 
>> On Mon, Jun 19, 2017 at 7:24 AM, Shekhar Bansal <shekhar0058@yahoo.com> wrote:
>> I have a standalone java app(containerised), it reads data from HDFS, does some transformations
and write data to remote storage. I want to make it scalable by launching multiple instances
of this java app. My problem is how to assign tasks among these instances. can helix solve
this problem?
>> 
>> If yes, can you please help me with following 
>> I referred helix quickstart example and created 1 resource per file but node1 got
assigned master for all resources, is it because of simple StateModelDefinition used in quickstart
example or I am using it wrong way or is it some limitation of helix
>> I want to avoid running a separate controller process, so If I run start controller
as part of setup will helix be able to elect master controller (in standalone mode), is it
advisable to run tens of controllers in distributed mode.
>> I schedule my app every five minutes using kubernetes cron, is it advisable to use
helix for such short lived processes
>> 
>> 
>> Thanks
>> Shekhar
>> 
>> 
>> 
> 

Mime
View raw message