cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bruc...@v365.com.au
Subject Re: Primary Storage
Date Tue, 23 Oct 2012 07:30:16 GMT
  

hi Julien

sadly there is no simple answer, depends on how much
time and $ you want to spend, and how much of your customers data you
want to retain, and at what cost to the customer is it important to
backup? IE if you suffer form vm loss because of san failure, then
expext many complaints, and customers will leave. Our customers are all
businesses, so we know we would loose them as customers if we lost their
data.

here's our experience. for everyone to criticize... LOL.


Footnote we saved 65K which we pass to our customers, and now have a
80Gb/s replicated SAN. which is very reliable.

we had the same issue.
been working with Fiber for years as contractors. And were faced with
8GB upgrade. + 10GBE upgrade, together would cost over 40K to upgrade
our FC and Ether switches. + $15K for mezzanine adapters, all 2nd hand
pricing. New would cost about $80K + Having worked with All the main
storage vendors for years while contracting i knew it would cost a
bundle. Which is why cloud costs so much.....well partly. 

So we tried
Infiniband, as there is enough support for ESX, Centos and Ubuntu. which
we use as hypervisors.

2 x 2nd hand QDR 40GB/s 36 port switches $3600
each
4 x HP QDR 40GB HP C7000 Chassis switches $1500 each
36 x HP QDR
Infiniband HCA's $129 ea $4.5K

total outlay $15K ish

we already had 2
x c7000 chassis spare, + BL460 servers, so we grabbed some HP BL2X220C
G6 Servers also, as they have 2 x mobo giving us much denser compute
power, 36 Servers (dual X5670s per server, 4 CPU/ each blade ) per C7000
Chassis, with dual 1GB Nic + dual 40GB IB. We use 10:1GB Virtual connect
on the chassis, no need to go up to 10GBE

so each server (2 in each
blade) has 2 x 1GB nics. 2 x Infiniband connections, infiniband has 4 x
10GB Virtual Nics + 2 x SRP Paths @40GB/s per blade motherboard.


Zoning(IB Vlan) on IB is limited so we team the 2 x 1GB Ethernet nics
which carry the production Vlans for Internet only. Thats plenty of BW
for most Virtual Web servers, of which we don't go over 20VM's per
server. so each server has dedicated 100MB/s bursting to 2GB/s to the
internet Vlans. IE routed public IP or load bal VIP. and not all servers
on each blade are internet facing so there is plently of B/W. Also we
load bal to VIP clusters which are on different blades so avoid i/o
bottlenecks for HA web farms of which we have a couple. the biggest is
spread across 8 Blades. and can handle 8Gb/s of internet load, as we
have 8 x 1TB providers in HA loadbal cluster.

 + we have 2 x 40GB SRP
paths which connect to the Primary SAN and 4 x 10GB IPOIB for production
networks, and iSCSI.

To build our clustered Infiniband Storage Array we
learnt Infiniband, then Ubuntu as its fairly easy to implement, and
OSNexus for NFS/IP based iSCSI. OSNexus is ok on infiniband we get
6-8GB/s or about 600-800MB/s read write with OSNEXUS using IPOIB and
iSCSI. But you need IB for 3-6 months to play with to figure it all
out.

we also have open-e DSS licenses but we found the same hardware
with open-e (via infiniband) only did about 350MB/s and we had no
control over the OS at all. So we ditched Open-e, 2 x 4TB licenses for
sale if anyone want them. They cost about $900 for 4TB licenses. 

the
OSNexus licenses cost only $695 x 2 for 8TB x 2 (16TB in total) so
better value for us. We use the OSNexus for NFS for secondary storage
NFS and CIFS which is all carried on production infiniband IP networks.
Using OSN ment we didnt need to CLI all the time. the Gui is very easy
to use and very stable.

This also gives us superb security as the
Infiniband IP network is not routed to the internet, Its only for
iscsi/nfs/cifs backup data. And the San 3 and 4 are also connected using
dual poer 40Gb/s Infiniband using IPOIB 

Each of the 2 OSNexus servers
run as primary/2ndary SANs so San 2 is a replica, which is constantly
replicating from san1. Price of SAN hardware was fairly cheap, 2RU
Server (inc 12 bay sas/sata backplane) + MOBO +dual PSU + Ram + LSI9260
raid +X5650 CPU+ 7 x 2TB WD RE SATAIII drives, in raid 6, with 1
hotspare. came to about $4500

We have now migrated to CENTOS 6.3 Using
RedHat sotrage Manager + SCST SRP + ISCSI. ! WOW its very powerful.
so
primary SAN is 2 commodity servers, Hex core Intel x 2, 32GB ECC Ram,
LSI 9280-8i x 2, 32port 6GB/s Expander chassis, dual power. 2 x 250GB
SSD for Cache Cade, 30 x 1TB WD Vraptor drives (200MB/s R/W each) in
raid 6 array with 1 spare drive. So 16 drives per LSI Array -1 hotspare
- 2 for raid 6 parity data =13 Drives @ 200MB/s R/W =2600MB/s and holds
about 12TB Raw. x 2 arrays in same server gives us 24TB of replicated
SAN. which is presented as file io using SCST.

We format using XFS and
present using SCST as file i/O. Cluster & replicate volumes using RHCM.


Our backup policy is vmware snapshot daily, archive daily using Veeam
backup to SRP SAN. And DR backup to archive daily

as our hypervisors
are vmware we can snap into the primary storage pool.

we decided to let
VMware manage the storage for each vm, with multiple 2TB Luns presented
(6 x 2TB) we have 6 Luns which can be used. we use a simple formula.
VMachine drive size x 2 multiply by customers VM's. Aprox drive size is
20-40GB for linux and win2k VM's so we use the larger. 40GB x 2 =80GB,
so a 2TB Lun fits about 50 Host VMs. of course snapshots never use the
full size of the Lun, and we thin provision, so the Lun never actually
gets close to 100% full. If it did snmp would warn us as would VCenter.
Also VDirector will not provision to Luns with less than 20% storage
left.

The sanpshots happen on SRP very quickly. a 40GB Lun usually only
takes 60-90 seconds. - when vmware begins its backup @2am there are as
many as 20~30 snap shots running and archives being updated. Because of
the IB network and fast primary ,2ndary, and 3rd and 4th sans using IB
over 300 VM's are backed up 3 TIMES in 4-5 hours. 1 snapsot, then veeam
backup, then vCenter backup via cifs

We use VCenter backup and
DRecovery to make a VFMS copy of each VM which compresses the data. it
keeps 8 weeks of backups on CIFS/NFS storage (the 16TB above) using CIFS
over Infiniband to Quantastor sans. this is a FREE SERVICE. we don't
charge for it as we felt it 100% essential that we can retrieve a 2TB
Lun instantly via replicated san, or 1 days Snapshot, and all else
failing 8 weeks of daily snaps from 3rd /4th SAN. Our customers are all
businesses, so we know we would loose them as customer if we lost their
data.

So yes storage is a major investment and has to be managed, takes
time to see what works. But as we use Infiniband the speed of it has
been a god send, as im sure it would be much slower of 10GB or Fiber.
ITs fast and very reliable.

Cheers

Bruce M

On 23.10.2012 12:55, Caleb
Call wrote:
> If I'm using fiber (which is what we do) I'm going
directly to the
> node not through another device that then shares it
out over much
> slower iSCSI or NFS. I realize using fiber is more
expensive for
> most, but for us it's the cheaper option because our
fiber
> infrastructure is already built out, our iSCSI not so much.
> 
>

> On Oct 22, 2012, at 10:04 PM, bruce.m@v365.com.au wrote:
> 
>>
>>
>>
Id suggest every one have a look at www.osnexus.com
>> supports
fiber,
>> 10gb, infiniband using the SCST iSCSI code from
>>
http://scst.sourceforge.net/
>> has NFS and all the good stuff including
a
>> pretty good GUI, replication (lumbering is not there yet) runs
on
>> Ubuntu
>>
>> On 23.10.2012 11:40, Andreas Huser wrote:
>>>
Hi,
>>>
>>> for
>> Cloudstack i use Solaris 11 ZFS + GlusterFS over
Infiniband
>>> (RDMA).
>> That gives the best performance and most
scalable Storage.
>>> I have
>> tasted some different solutions for
primary Storage but the
>>> most are
>> to expensive and for a
CloudStack Cluster not economic or
>>> have a poor
>>
performance.
>>>
>>> My Configuration:
>>> Storage Node:
>>> Supermicro
Server
>> (Intel Hardware) with Solaris 11 with SSD write and
>>> read
cache (read
>> crucial-m4, write ZeusIOPS) GlusterFS and dualport
>>>
ConnectX 40Gbit/s
>> Infiniband adapter.
>>>
>>> I have installed
GlusterFS direct on Solaris
>> with a modified code.
>>> Want you build
bigger systems for more then 50
>> VMs it is better you
>>> split the
Solaris and GlusterFS with a separte
>> headnode for GlusterFS
>>>
>>>
That looks like:
>>> Solaris ZFS
>> Backendstorage with a dataset Volume
(Thin Provision) -->
>>> ( SRP Target
>> attached direct without
Infiniband switch to GF Node)
>>> --> GlusterFS
>> Node the srp target
formatted with xfs filesystem,
>>> create a GlusterFS
>> Volume --> (
Infiniband over a Mellanox Port
>>> Switch) --> Cloudstack
>> Node mount
glusterFS Volume over RDMA
>>>
>>> For the Dataset Volume at the
>> ZFS
Storage, disable atime and enable
>>> compression.
>>> (Space
reclaim)
>> With compression you can shrink the ZFS Volume with
>>>
command at Linux
>> dd /dev/zero or In a Windows VM with sdelete
>>>
That gives you space left
>> on the Primary Storage for deleted Files
in
>>> a VM or for deleted vhd's
>> or vm's in the cloudstack
>>>
>>>
greeting Andreas
>>>
>>>
>>>
>>>
>>> Mit
>> freundlichen Grüßen
>>>
>>>
Andreas Huser
>>> Geschäftsführer
>>> System
>> Engineer /
Consultant
>>> (Cisco CSE, SMBAM, LCSE, ASAM)
>>>
>>
---------------------------------------
>>> Zellerstraße 28 - 77654
>>
Offenburg
>>> Tel: +49(781) 12786898
>>> Mobil: +49(176) 10308549
>>>
>>
ahuser@7five-edv.de
>>>
>>>
>>>
>>>
>>> ----- Ursprüngliche Mail
-----
>>>
>>>
>> Von: "Outback Dingo"
>>> An:
cloudstack-users@incubator.apache.org
>>>
>> Gesendet: Dienstag, 23.
Oktober 2012 02:15:16
>>> Betreff: Re: Primary
>> Storage
>>>
>>> On
Mon, Oct 22, 2012 at 8:09 PM, Ivan Rodriguez wrote:
>>>>
>> Solaris 11
ZFS and yes we tried different setups, raids levels number
>> of
>>>>
SSD cache, ARC zfs options etc etc etc.
>>>>
>>>> Cheers
>>>>
>>>
>>>
>>
VMWare ??
>>>
>>>> On Tue, Oct 23, 2012 at 11:05 AM, Outback Dingo
>>
wrote:
>>>>
>>>>> On Mon, Oct 22, 2012 at 8:03 PM, Ivan Rodriguez
wrote:
>>>>>
>>> We are using ZFS, with jbod, not in production yet
exporting NFS
>> to
>>>>>> cloudstack, I'm not really happy about the
performance
>>>>>>
>> but I think is related to the hardware itself
rather than technology,
>> we
>>>>>> are using intel SR2625UR and Intel
320 SSD, we were evaluating
>> gluster as
>>>>>> well, but we decided to
move away from that path since
>> gluster nfs is
>>>>> still
>>>>>>
performing poorly, plus we would like to
>> see cloudstack integrating
the
>>>>>> gluster-fuse module, we haven't
>> decided the final storage
setup but at
>>>>> the
>>>>>> moment we had
>> better results with
ZFS.
>>>>>>
>>>>>>
>>>>>
>>>>> question is whos ZFS and
>> have you
"tweaked" the zfs / nfs config for
>>>>>
performance
>>>>>
>>>>>
>>>
>>>>>> On Tue, Oct 23, 2012 at 10:44 AM, Nik
Martin >> >wrote:
>>>>>
>>>
>>>>>>> On 10/22/2012 05:49 PM, Trevor
Francis wrote:
>>>>>>>
>>>>>>>>
>> ZFS looks really interesting to me
and I am leaning that way. I am
>>>>>
>>>>> considering using FreeNAS,
as people seem to be having good luck
>> with
>>>>>>>> it. Can anyone
weigh in here?
>>>>>>>>
>>>>>>>>
>>>>>>> My
>> personal opinion, I think
FreeNAS and OpenFiler have horrible,
>>>>>
>> horrible
>>>>>>> User
Interfaces - not very intuitive, and they both seem
>> to be file
>>>>>
servers
>>>>>>> with things like iSCSI targets tacked on
>> as an
afterthought.
>>>>>>>
>>>>>>> Nik
>>>>>>>
>>>>>>>
>>>>>>>> Trevor
>>
Francis
>>>>>>>> Partner
>>>>>>>> 46 Labs | The PeerEdge
Cloud
>>>>>>>>
>> http://www.46labs.com |
>>>>>
http://www.peeredge.net
>>>>>>>>
>>>>>>>>
>> 405-362-0046 - Voice |
405-410-4980 - Cell
>>>>>>>> trevorgfrancis -
>> Skype
>>>>>>>>
trevor@46labs.com
>>>>>>>> Solutions Provider for the
>> Telecom
Industry
>>>>>>>>
>>>>>>>>> <
>>>>>>>>
>>
http://www.twitter.**com/peeredge ><
>>>>>>>>
>>
http://www.**twitter.com/peeredge ><
>>>>>>>>
>>
http://**www.facebook.com/PeerEdge >
>>>>>>>>
>>>>>>>> On Oct 22, 2012,
at
>> 2:30 PM, Jason Davis wrote:
>>>>>>>>
>>>>>>>> ZFS would be an
interesting
>> setup as you can do the cache pools like
>>>>>
you
>>>>>>>>> would do in
>> CacheCade. The problem with ZFS or
CacheCade+DRBD is that
>>>>>>>>>
>> they
>>>>>>>>> really don't scale
out well if you are looking for
>> something with a
>>>>>>>>>
unified
>>>>>>>>> name space. I'll say however
>> that ZFS is a battle
hardened FS with
>>>>> tons
>>>>>>>>> of
>>>>>>>>>
>> shops using it. A
lot of the whiz-bang SSD+SATA disk SAN things
>> these
>>>>>>>>> smaller
start up companies are hocking are just ZFS
>>
appliances.
>>>>>>>>>
>>>>>>>>> RBD looks interesting but I'm not sure
if
>> I would be willing to put
>>>>>>>>> production data on it, I'm not
sure
>> how performant it is IRL. From a
>>>>>>>>> purely technical
perspective,
>> it looks REALLY cool.
>>>>>>>>>
>>>>>>>>> I suppose
anything is fast if
>> you put SSDs in it :) GlusterFS is
>>>>>
another
>>>>>>>>> option although
>> historically small/random IO has
not been it's strong
>>>>>>>>>
>> point.
>>>>>>>>>
>>>>>>>>> If you are
ok spending money on software and
>> want a scale out block
>>>>>>>>>
storage
>>>>>>>>> then you might want to
>> consider HP LeftHand's VSA
product. I am
>>>>>>>>> personally
>>>>>>>>>
>> partial to NFS plays:) I
went the exact opposite approach and
>> settled
>>>>> on
>>>>>>>>>
Isilon for our primary storage for our CS
>>
deployment.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon,
Oct 22,
>> 2012 at 10:24 AM, Nik Martin >> >>>>
>>wrote:
>>>>>>>>>
>>>>>>>>> On
>> 10/22/2012 10:16 AM, Trevor Francis
wrote:
>>>>>>>>>>
>>>>>>>>>> We are
>> looking at building a Primary
Storage solution for an
>>>>>>>>>>>
>> enterprise/carrier class
application. However, we want to build it
>>>>>
>>>>>>>>
using
>>>>>>>>>>> a FOSS solution and not a commercial solution.
>> Do
you have a
>>>>>>>>>>> recommendation on
platform?
>>>>>>>>>>>
>>>>>
>>>>>>>>
>>>>>>>>>>>
Trevor,
>>>>>>>>>>
>>>>>>>>>> I got EXCELLENT results
>> builing a SAN
from FOSS using:
>>>>>>>>>> OS: Centos
>>>>>>>>>> Hardware:
>> 2X
storage servers, with 12x2TB 3.5 SATA drives. LSI
>>>>>
MegaRAID
>>>>>
>>>>>>> with CacheCade Pro, with 240 GB Intel 520 SSDs
configured to do
>> SSD
>>>>>>>>>> caching
>>>>>>>>>> (alternately, look
at FlashCache from
>> Facebook)
>>>>>>>>>> intel 10GB dual port nics,
one port for crossover,
>> on port for up
>>>>> link
>>>>>>>>>>
to
>>>>>>>>>> storage network
>>>>>
>>>>>>>
>>>>>>>>>> DRBD for real
time block replication to
>> active-active
>>>>>>>>>> Pacemaker+corosync
for HA Resource
>> management
>>>>>>>>>> tgtd for iSCSI
target
>>>>>>>>>>
>>>>>>>>>> If you
>> want file backed storage, XFS is
a very good filesystem on
>>>>> Linux
>>>>>
>>>>>>>
now.
>>>>>>>>>>
>>>>>>>>>> Pacemaker+Corosync can be difficult to
>>
grok at the beginning, but
>>>>> that
>>>>>>>>>> setup gave me a VERY
high
>> performance SAN. The downside is it is
>>>>>>>>>>
entirely
>>>>>>>>>>
>> managed by CLI, no UI
whatsoever.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Trevor
>>
Francis
>>>>>>>>>>> Partner
>>>>>>>>>>> 46 Labs | The PeerEdge
Cloud
>>>>>
>>>>>>>> http://www.46labs.com |
>>>>>>>>>>>
http://www.peeredge.net
>>>>>
>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
405-362-0046 - Voice | 405-410-4980 -
>> Cell
>>>>>>>>>>> trevorgfrancis
- Skype
>>>>>>>>>>> trevor@46labs.com >>
>>
trevor@46labs.com
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>
Solutions Provider for the Telecom Industry
>>>>>>>>>>>
>>>>>>>>>>>>>
>>
http://www.twitter.com/**peeredge><
>>>>>>>>>>>
>>
http://www.twitter.com/**peeredge >> >>><
>>>>>>>>>>>
>>
http://www.twitter.**com/**peeredge <
>>>>>
>>
http://www.twitter.com/**peeredge
>>>>>>>>>>>>> <
>>>>>>>>>>>
>>
http://www.**twitter.com/**peeredge <
>>>>>>>>>>>
>>
http://www.twitter.com/**peeredge >> >>><
>>>>>>>>>>>
>>
http://**www.facebook.com/**PeerEdge<
>>>>>
>>
http://www.facebook.com/PeerEdge><
>>>>>>>>>>>
>>
http://www.facebook.com/**PeerEdge <
>>>>>
>>
http://www.facebook.com/PeerEdge>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>
>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message