samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Milinda Pathirage <mpath...@umail.iu.edu>
Subject Re: How should Samza be run on AWS?
Date Wed, 05 Aug 2015 19:31:02 GMT
Hi all,

Looks like github deployment used by my university doesn't allow public
access. I moved it to github (https://github.com/milinda/samza-ec2-ansible).

Thanks
Milinda

On Wed, Aug 5, 2015 at 2:03 PM, Milinda Pathirage <mpathira@umail.iu.edu>
wrote:

> I wrote several  Ansible playbooks to deploy YARN (without HDFS),
> Zookeeper and Kafka to EC2 for deploying Samza jobs. If you know ansible
> those scripts may be helpful. You can find them at
> https://github.iu.edu/mpathira/samza-ec2-ansible. I was planning to add
> document describing these scripts but could do it yet. I looked at EMR
> also, but as I remember EMR job deployment model doesn't work with current
> scripts provided by Samza.
>
> I used R3 instances for Kafka and C3 instances for YARN. As I remember I
> could get close to 1million msg/s with 3 node Kafka cluster running on
> r3.xlarge instance and 2 (or 4) node YARN cluster running 4 stream tasks
> per job.
>
> Thanks
> Milinda
>
> On Wed, Aug 5, 2015 at 11:27 AM, Gian Merlino <gianmerlino@gmail.com>
> wrote:
>
>> I don't know of any tutorials, but the order to tackle things would be:
>>
>> 1) Set up ZK- this could be a single node install for a PoC or a 3 or 5
>> node install for production. m3.medium is a reasonable node type.
>>
>> 2) Set up Kafka- could be a single instance without replication for a PoC.
>> For production, as many as you need, and you'd probably want replication.
>> I
>> think if you want to use local instance storage, i2 instances are good,
>> and
>> if you want to use EBS, probably m3 instances.
>>
>> 3) Set up YARN- this could be a single instance (running
>> pseudo-distributed
>> with master & slave on the same machine) or two instances (one master, one
>> slave) for a PoC. I think c3 or r3 instance types are good for the slaves,
>> depending on how much memory you need. Workloads without large amounts of
>> state should be ok on c3 instances.
>>
>> EMR might actually work for YARN if you use the long-running kind of
>> cluster (see:
>>
>> http://docs.aws.amazon.com/ElasticMapReduce/latest/ManagementGuide/emr-plan-longrunning-transient.html
>> ).
>> I haven't tried that, but it might be worth a shot before going for stock
>> apache hadoop.
>>
>> On Tue, Aug 4, 2015 at 5:58 PM, Job-Selina Wu <swucareer99@gmail.com>
>> wrote:
>>
>> > Dear All:    I was looking for the tutorial how to build and run Samza
>> on
>> > AWS and then I found a link below. I am wondering if there is a detail
>> > tutorial about how to build Samza on AWS?
>> >
>> > Sincerely,
>> > Selina
>> >
>> >
>> >
>> https://cwiki.apache.org/confluence/display/SAMZA/FAQ#FAQ-HowshouldSamzaberunonAWS
>> > ?
>> > How should Samza be run on AWS?
>> >
>> > From Gian Merlino:
>> >
>> >    - We've been using Samza in production on AWS for a little over a
>> > month. We're
>> >    just using the YARN runner on a mostly stock hadoop 2.4.0 cluster
>> (not
>> >    EMR). Our experience is that c3s work well for the YARN instances and
>> > i2s
>> >    work well for the Kafka instances. Things have been pretty solid with
>> > that
>> >    setup. For scaling up and scaling down YARN, we just terminate
>> instances
>> >    or add instances, and this works pretty well. It can take a few
>> minutes
>> >    for the cluster to realize a node has gone and respawn containers
>> >    elsewhere. We have a separate Kafka cluster just for Samza's use,
>> >    different from our main Kafka cluster. The main reason is that we
>> wanted
>> >    to isolate off the disk and network load of state compactions and
>> >    restores (we don't use compacted topics in our main Kafka cluster,
>> but
>> >    we do use them with Samza, and the extra load on Kafka can be
>> >    substantial).
>> >
>>
>
>
>
> --
> Milinda Pathirage
>
> PhD Student | Research Assistant
> School of Informatics and Computing | Data to Insight Center
> Indiana University
>
> twitter: milindalakmal
> skype: milinda.pathirage
> blog: http://milinda.pathirage.org
>



-- 
Milinda Pathirage

PhD Student | Research Assistant
School of Informatics and Computing | Data to Insight Center
Indiana University

twitter: milindalakmal
skype: milinda.pathirage
blog: http://milinda.pathirage.org

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message