cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabrice Brazier <fabrice.braz...@apalia.net>
Subject RE: Primary Storage
Date Tue, 23 Oct 2012 10:30:50 GMT
Hi Andreas,

Hum that's pretty cool, i know they have still trouble with infiniband on
nexenta but it's clearly a priority on their roadmap (plus i trust more the
nexenta team to evolve zfs than oracle).

I agree iscsi over infiniband increase the latency but most of the time it's
just more simple to use IPOIB than IB, for example if you use vmware the
support of infiniband it's assured by mellanox and not vmware so if you have
an issue the vmware support probably don't help you anymore.

About the raid i'm a fan of raid 10, i prefer build a raid 10 with 7200 sas
drive than multiple vdev raidz/raidz2 15k sas drive. Particularly for
virtual environment with a random workload and multiple access.

I'm a fan of nfs so agreed about the zil, and zeus ram are the only one for
that (with classical ssd you can hit the write hole).

Agreed for compression too (but only lzjb level gzip use too much cpu),
disable atime permit to decrease the iops load but i'm not sure is really a
best practice. About the deduplication I don't totally agree. It really
depends of your array, the workload and the VM type on your cloudstack.
Actually when i build a zfs array i count 1Gb ram for 1Tb disks. With
deduplication i count 2Gb ram for 1 Tb disks (plus in nexenta 4 they will
add a possibility to use ssd for the deduplication metada, like the l2arc
for the read cache).

The last point it's about your drives, why sata drives? I mean sata doesn't
respect their latency, and the reliability of a sas drive is 10x the sata.
Plus now you can found many NL-sas drive at low cost

But it's really a nice architecture, i never try glusterfs (for the moment)
plus in that case it's really a good way to have a replacement to a
metro-cluster for free, i try one of is competitor (onefs) and clearly
clustered filesystem are the futur.

Cheers,
Fabrice

-----Message d'origine-----
De : Andreas Huser [mailto:ahuser@7five-edv.de]
Envoyé : mardi 23 octobre 2012 11:37
À : cloudstack-users@incubator.apache.org
Objet : Re: Primary Storage

Hi Fabrice,

I don't know  what other people do but i have no problems with Infiniband +
GlusterFS + Cloudstack I'm  not use  Nexenta it's based on Illumos and work
not fine with Infiniband.
I have two different clusters in productiv enviroments.

The first: Solaris 11  with built-in GlusterFS 3.3 export Gluster Vol. with
RDMA. > performance is okey you can use that for smaller enviroments The
Second: Is a little bit complex with a GlusterFS Server in the middle

ZFS Server: based on Solaris 11
1.) Create a zpool min. two vdevs and SSD read/write cache
2.) Create a thin provisioning Volume "zfs create -V" disable atime and
enable compression (do not enable dedup!) and export as (iWarp) SRP Target
to the GlusterFS Server use a direct connection without IB Switch.

GlusterFS Server:
1.) Use RedHat, CentOS or Fedora (I use CentOS 5 and 6)
2.) Use OFED driver from https://www.openfabrics.org
3.) Import the SRP Target from ZFS Server and format as xfs
4.) Create a Glustervolume "volume create xy transport rdma " (use only
rdma)
5.) Connect with the second IB Port to a IB Switch

Cloudstack Hypervisor Node:
1.) Use RedHat, CentOS or Fedora (I use CentOS 5 and 6)
2.) Use OFED driver from https://www.openfabrics.org
3.) Import the Glustervolume

ZFS Thin Volume ---- Infinband SRP  ----> GlusterFS ---- GFSVol rdma ---->
IB Switch ----> Clients

The ZFS and GlusterFS Server form a Storage unit connect direct with 40Gbit
Infiniband Point-to-Point You do not feel that is a cable between!

Important: When you have Infiniband use not IPoIB with iSCSI! If one already
has Infiniband then you should also use the advantage.
IPoIB have a higher Latency as iWarp SRP!


SRP = usec

-- SRP --
local address: LID 0x01 QPN 0x44004b PSN 0xf3265b RKey 0x9804237c VAddr
0x00000001dda000 remote address: LID 0x0a QPN 0x10004a PSN 0x44072e RKey
0x1c0f115 VAddr 0x000000088e6000
------------------------------------------------------------------
 #bytes #iterations    t_min[usec]    t_max[usec]  t_typical[usec]
 2       1000          1.29           125.30       1.31
------------------------------------------------------------------

-- IPoIB ---
[root@sv01sfogaa ~]# ping 10.200.0.10
PING 10.200.0.10 (10.200.0.10) 56(84) bytes of data.
64 bytes from 10.200.0.10: icmp_seq=1 ttl=255 time=0.147 ms
64 bytes from 10.200.0.10: icmp_seq=2 ttl=255 time=0.116 ms

When you put load on IPoIB then increases the latency that is not good


That is my recommendation for a simple GlusterFS Mirror:

- Supermicro Server with Intel Hardware and Expander Backplane
- 1x Crucal-M4 SSD Read Cache
- 2x ZeusIOPs SSD Write cache (mirror)
- SATA 24/7 Harddrives
- LSI HBA 9207 or 9211
- ConnectX-2 QDR Dualport Infiniband Adaper (HP refurbished with full
warrany for 100$) Importent > Flash newest Firmware from Mellanox!
- Mellanox IB Swich
- Solaris 11
- GlusterFS 3.3 compiled with ib_verbs
- Gluster Volume transport only rdma

>> Throughput constant up to 200 Mbyte/s
more throughput with more storage servers or more harddrives on jbod

Info:

- I have some problems with infiniband RDMA or SRP with OpenIndiana or
Illumos or Nexenta. Some adapters have a high latency or not a stable
connection. Use Solaris that's the right way!
- OpenIndiana is Beta! Infiniband ib_verbs works not or not fine!
- Use Solaris 11 Infiniband ib_verbs are native and stable
- Don't use Ubuntu Client Server for Infiniband! Use RedHat, Fedora or
CentOS and install the right drivers from
https://www.openfabrics.org/downloads/OFED/
- You have not SSD Cache? Disable at the ZFS Volume the sync! Important! You
loose Security for your Data but some protocols use sync flags in transport.
Example NFS use by default fsync. write cache are not active. NFS writes
data direct to the Harddrive. For Data Security and Performance give the
Storage Server a SSD write cache. ZFS works at default with sync=standard
that prevent write holes. (COW System)

I hope that I could help a little

Greeting from Germany
Andreas



----- Ursprüngliche Mail -----

Von: "Fabrice Brazier" <fabrice.brazier@apalia.net>
An: cloudstack-users@incubator.apache.org
Gesendet: Dienstag, 23. Oktober 2012 09:55:15
Betreff: RE: Primary Storage

Hi Andreas,

Hello i just see your configuration, it seems quite interesting.
If i understand well you want to build some zfs array on the backend.
Export luns (probably by iscsi over infiniband) to you linux cluster, and on
the linux cluster you put glusterFS.
I can understand the point, with that you can have very good performance and
reliability (zfs), scalability and redundancy (gluster) for very low cost.
So just one question, did you try the global namespace implementation from
nexenta?
If yes can you tell me what configuration is the best for you?
I mean the fact you have a gluster cluster in the middle must impact the
overral performance no?

Fabrice

-----Message d'origine-----
De : Andreas Huser [mailto:ahuser@7five-edv.de] Envoyé : mardi 23 octobre
2012 05:40 À : cloudstack-users@incubator.apache.org
Objet : Re: Primary Storage

Hi,

for Cloudstack i use Solaris 11 ZFS + GlusterFS over Infiniband (RDMA). That
gives the best performance and most scalable Storage.
I have tasted some different solutions for primary Storage but the most are
to expensive and for a CloudStack Cluster not economic or have a poor
performance.

My Configuration:
Storage Node:
Supermicro Server (Intel Hardware) with Solaris 11 with SSD write and read
cache (read crucial-m4, write ZeusIOPS) GlusterFS and dualport ConnectX
40Gbit/s Infiniband adapter.

I have installed GlusterFS direct on Solaris with a modified code.
Want you build bigger systems for more then 50 VMs it is better you split
the Solaris and GlusterFS with a separte headnode for GlusterFS

That looks like:
Solaris ZFS Backendstorage with a dataset Volume (Thin Provision) --> ( SRP
Target attached direct without Infiniband switch to GF Node) --> GlusterFS
Node the srp target formatted with xfs filesystem, create a GlusterFS
Volume --> ( Infiniband over a Mellanox Port Switch) --> Cloudstack Node
mount glusterFS Volume over RDMA

For the Dataset Volume at the ZFS Storage, disable atime and enable
compression.
(Space reclaim) With compression you can shrink the ZFS Volume with command
at Linux dd /dev/zero or In a Windows VM with sdelete That gives you space
left on the Primary Storage for deleted Files in a VM or for deleted vhd's
or vm's in the cloudstack

greeting Andreas




Mit freundlichen Grüßen

Andreas Huser
Geschäftsführer
System Engineer / Consultant
(Cisco CSE, SMBAM, LCSE, ASAM)
---------------------------------------
Zellerstraße 28 - 77654 Offenburg
Tel: +49(781) 12786898
Mobil: +49(176) 10308549
ahuser@7five-edv.de




----- Ursprüngliche Mail -----

Von: "Outback Dingo" <outbackdingo@gmail.com>
An: cloudstack-users@incubator.apache.org
Gesendet: Dienstag, 23. Oktober 2012 02:15:16
Betreff: Re: Primary Storage

On Mon, Oct 22, 2012 at 8:09 PM, Ivan Rodriguez <ivanoch@gmail.com> wrote:
> Solaris 11 ZFS and yes we tried different setups, raids levels number
> of SSD cache, ARC zfs options etc etc etc.
>
> Cheers
>

VMWare ??

> On Tue, Oct 23, 2012 at 11:05 AM, Outback Dingo
> <outbackdingo@gmail.com>wrote:
>
>> On Mon, Oct 22, 2012 at 8:03 PM, Ivan Rodriguez <ivanoch@gmail.com>
>> wrote:
>> > We are using ZFS, with jbod, not in production yet exporting NFS to
>> > cloudstack, I'm not really happy about the performance
>> > but I think is related to the hardware itself rather than technology,
>> > we
>> > are using intel SR2625UR and Intel 320 SSD, we were evaluating gluster
>> > as
>> > well, but we decided to move away from that path since gluster nfs is
>> still
>> > performing poorly, plus we would like to see cloudstack integrating the
>> > gluster-fuse module, we haven't decided the final storage setup but at
>> the
>> > moment we had better results with ZFS.
>> >
>> >
>>
>> question is whos ZFS and have you "tweaked" the zfs / nfs config for
>> performance
>>
>> >
>> > On Tue, Oct 23, 2012 at 10:44 AM, Nik Martin <nik.martin@nfinausa.com
>> >wrote:
>> >
>> >> On 10/22/2012 05:49 PM, Trevor Francis wrote:
>> >>
>> >>> ZFS looks really interesting to me and I am leaning that way. I am
>> >>> considering using FreeNAS, as people seem to be having good luck with
>> >>> it. Can anyone weigh in here?
>> >>>
>> >>>
>> >> My personal opinion, I think FreeNAS and OpenFiler have horrible,
>> horrible
>> >> User Interfaces - not very intuitive, and they both seem to be file
>> servers
>> >> with things like iSCSI targets tacked on as an afterthought.
>> >>
>> >> Nik
>> >>
>> >>
>> >>> Trevor Francis
>> >>> Partner
>> >>> 46 Labs | The PeerEdge Cloud
>> >>> http://www.46labs.com <http://www.46labs.com/> |
>> http://www.peeredge.net
>> >>> <http://www.peeredge.net/>
>> >>> 405-362-0046 - Voice | 405-410-4980 - Cell
>> >>> trevorgfrancis - Skype
>> >>> trevor@46labs.com <mailto:trevor@46labs.com>
>> >>> Solutions Provider for the Telecom Industry
>> >>>
>> >>> <http://www.twitter.com/**peeredge
>> >>> <http://www.twitter.com/peeredge>><
>> >>> http://www.twitter.**com/peeredge <http://www.twitter.com/peeredge>><
>> >>> http://www.**twitter.com/peeredge <http://www.twitter.com/peeredge>><
>> >>> http://**www.facebook.com/PeerEdge
>> >>> <http://www.facebook.com/PeerEdge>>
>> >>>
>> >>> On Oct 22, 2012, at 2:30 PM, Jason Davis wrote:
>> >>>
>> >>> ZFS would be an interesting setup as you can do the cache pools like
>> you
>> >>>> would do in CacheCade. The problem with ZFS or CacheCade+DRBD is
>> >>>> that
>> >>>> they
>> >>>> really don't scale out well if you are looking for something with
a
>> >>>> unified
>> >>>> name space. I'll say however that ZFS is a battle hardened FS with
>> tons
>> >>>> of
>> >>>> shops using it. A lot of the whiz-bang SSD+SATA disk SAN things
>> >>>> these
>> >>>> smaller start up companies are hocking are just ZFS appliances.
>> >>>>
>> >>>> RBD looks interesting but I'm not sure if I would be willing to
put
>> >>>> production data on it, I'm not sure how performant it is IRL. From
a
>> >>>> purely technical perspective, it looks REALLY cool.
>> >>>>
>> >>>> I suppose anything is fast if you put SSDs in it :) GlusterFS is
>> another
>> >>>> option although historically small/random IO has not been it's
>> >>>> strong
>> >>>> point.
>> >>>>
>> >>>> If you are ok spending money on software and want a scale out block
>> >>>> storage
>> >>>> then you might want to consider HP LeftHand's VSA product. I am
>> >>>> personally
>> >>>> partial to NFS plays:) I went the exact opposite approach and
>> >>>> settled
>> on
>> >>>> Isilon for our primary storage for our CS deployment.
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Mon, Oct 22, 2012 at 10:24 AM, Nik Martin
>> >>>> <nik.martin@nfinausa.com
>> >>>> <mailto:nik.martin@nfinausa.**com <nik.martin@nfinausa.com>>>wrote:
>> >>>>
>> >>>> On 10/22/2012 10:16 AM, Trevor Francis wrote:
>> >>>>>
>> >>>>> We are looking at building a Primary Storage solution for an
>> >>>>>> enterprise/carrier class application. However, we want to
build it
>> >>>>>> using
>> >>>>>> a FOSS solution and not a commercial solution. Do you have
a
>> >>>>>> recommendation on platform?
>> >>>>>>
>> >>>>>>
>> >>>>>> Trevor,
>> >>>>>
>> >>>>> I got EXCELLENT results builing a SAN from FOSS using:
>> >>>>> OS: Centos
>> >>>>> Hardware: 2X storage servers, with 12x2TB 3.5 SATA drives. LSI
>> MegaRAID
>> >>>>> with CacheCade Pro, with 240 GB Intel 520 SSDs configured to
do SSD
>> >>>>> caching
>> >>>>> (alternately, look at FlashCache from Facebook)
>> >>>>> intel 10GB dual port nics, one port for crossover, on port for
up
>> link
>> >>>>> to
>> >>>>> storage network
>> >>>>>
>> >>>>> DRBD for real time block replication to active-active
>> >>>>> Pacemaker+corosync for HA Resource management
>> >>>>> tgtd for iSCSI target
>> >>>>>
>> >>>>> If you want file backed storage, XFS is a very good filesystem
on
>> Linux
>> >>>>> now.
>> >>>>>
>> >>>>> Pacemaker+Corosync can be difficult to grok at the beginning,
but
>> that
>> >>>>> setup gave me a VERY high performance SAN. The downside is it
is
>> >>>>> entirely
>> >>>>> managed by CLI, no UI whatsoever.
>> >>>>>
>> >>>>>
>> >>>>> Trevor Francis
>> >>>>>> Partner
>> >>>>>> 46 Labs | The PeerEdge Cloud
>> >>>>>> http://www.46labs.com <http://www.46labs.com/> |
>> >>>>>> http://www.peeredge.net
>> >>>>>> <http://www.peeredge.net/>
>> >>>>>>
>> >>>>>> 405-362-0046 - Voice | 405-410-4980 - Cell
>> >>>>>> trevorgfrancis - Skype
>> >>>>>> trevor@46labs.com <mailto:trevor@46labs.com> <mailto:
>> trevor@46labs.com
>> >>>>>> >
>> >>>>>>
>> >>>>>>
>> >>>>>> Solutions Provider for the Telecom Industry
>> >>>>>>
>> >>>>>> <http://www.twitter.com/****peeredge<
>> http://www.twitter.com/**peeredge><
>> >>>>>> http://www.twitter.com/**peeredge <http://www.twitter.com/peeredge
>> >>><
>> >>>>>> http://www.twitter.**com/**peeredge <
>> http://www.twitter.com/**peeredge<http://www.twitter.com/peeredge>
>> >>>>>> >><
>> >>>>>> http://www.**twitter.com/**peeredge <http://twitter.com/peeredge>
>> >>>>>> <
>> >>>>>> http://www.twitter.com/**peeredge <http://www.twitter.com/peeredge
>> >>><
>> >>>>>> http://**www.facebook.com/**PeerEdge<
>> http://www.facebook.com/PeerEdge><
>> >>>>>> http://www.facebook.com/**PeerEdge <
>> http://www.facebook.com/PeerEdge>
>> >>>>>> >>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>
>>

Mime
View raw message