cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shanker Balan <>
Subject Re: local storage ?
Date Wed, 16 Apr 2014 05:26:23 GMT
Hi Zack,

Comments inline.

On 15-Apr-2014, at 9:24 pm, Zack <> wrote:

> Issue being that ideally, I'd like to take advantage of the logic built into Hadoop and
Kafka to leverage JBOD for parallel I/O.  My worry is that zfs drivers won't be as clever
with application specific I/O optimizations.
> Z

The only IO optimisation I can think of would be parallization of reads/writes
across the spindles.

Looking at how AWS EMR works, it uses S3 instead of HDFS for its storage requirements.
The ephemeral disk(s) are only used during a job run. I feel this is a superior
scaling design when compared to using local disks.


(1) Save storage space by eliminating the need for HDFS 3x replication factor. Rather
than have 3 copies of each log, let the storage system handle dedup and striping across
the JBODs.
(2) Save on bandwidth between TOR switches
(3) Application lifecycle management becomes simple as there is no statefull data on
compute nodes. I can blow the data on the local disk without worrying about data loss
(4) Scale storage independent of compute nodes
(5) Parallel object PUT/GET requests is possible with object storage

(3) and (4) are really the deal clincher for me. One node going down should not bring
hadoop to a standstill and I should be able to manage storage requirements without
investing in additional compute nodes (hypervisors) or vice versa.

You can consider building a distributed storage service using your JBOD. These days,
most distributed storage systems (Gluster, RiakCS) provide an S3 interface and
stock hadoop (?) seems to support S3 natively now. Let Gluster/Riak handle
striping across the JBODs.

As for Kafka, its been a while since I tracked the project. I am not certain if it
supports object storage. Please do check.

On the whole, I would opt for a object storage as a service for your different
workloads than a app specific cloud.

Also given the adoption of EMR + S3 on AWS, object storage seems to be the way
to go forward.



M: +91 98860 60539 | O: +91 (80) 67935867 | | Twitter:@shapeblue
ShapeBlue Services India LLP, 22nd floor, Unit 2201A, World Trade Centre, Bangalore - 560

Need Enterprise Grade Support for Apache CloudStack?
Our CloudStack Infrastructure Support<>
offers the best 24/7 SLA for CloudStack Environments.

Apache CloudStack Bootcamp training courses

**NEW!** CloudStack 4.2.1 training<>
28th-29th May 2014, Bangalore. Classromm<>
16th-20th June 2014, Region A. Instructor led, On-line<>
23rd-27th June 2014, Region B. Instructor led, On-line<>
15th-20th September 2014, Region A. Instructor led, On-line<>
22nd-27th September 2014, Region B. Instructor led, On-line<>
1st-6th December 2014, Region A. Instructor led, On-line<>
8th-12th December 2014, Region B. Instructor led, On-line<>

This email and any attachments to it may be confidential and are intended solely for the use
of the individual to whom it is addressed. Any views or opinions expressed are solely those
of the author and do not necessarily represent those of Shape Blue Ltd or related companies.
If you are not the intended recipient of this email, you must neither take any action based
upon its contents, nor copy or show it to anyone. Please contact the sender if you believe
you have received this email in error. Shape Blue Ltd is a company incorporated in England
& Wales. ShapeBlue Services India LLP is a company incorporated in India and is operated
under license from Shape Blue Ltd. Shape Blue Brasil Consultoria Ltda is a company incorporated
in Brasil and is operated under license from Shape Blue Ltd. ShapeBlue is a registered trademark.

View raw message