cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Davis <scr...@gmail.com>
Subject Re: Primary Storage
Date Tue, 23 Oct 2012 13:20:03 GMT
Out of curiosity, is there any quick performance numbers for these ZFS +
GlusterFS mashups you guys are talking about?

Specifically, IOPs and latency? Sequential read/write performance honestly
isn't a very good benchmark to determine your SANs performance. It's like
comparing CPUs based solely on how many GHz it runs at. Sure you can get
great MB or GB/s with SATA disk but I'd reckon that IOP performance is
abismal. If you are utilizing GlusterFS without the cache pooling magic
that is ZFS then I would imagine that latency can be an issue.



On Tue, Oct 23, 2012 at 7:56 AM, Andreas Huser <ahuser@7five-edv.de> wrote:

> Hi Fabrice,
>
> i know OpenSolaris/Solaris   Oracle it's so a thing.
> I'm for more then 10 years a open source user and that
> with oracle - i did no like at the beginning of this constallation.
> But Oracle makes his work good i know that. The cost of one socket
> are 700$ and you can use so much quantity of TB  as you will.
> And you can use the full premier Support from Oracle.
> Nexenta develop with the Illumos code. And the Licence are TB based.
> That is not my favorite. As well the pool version from Nexenta comes
> not after. Current Nexenta   Infiniband are not a usable solution.
> But every can use what he will. Everyone must decide for themselves.
>
> SRP Targets or iser are not difficult to configure. Use the SRP for
> the Storage unit connection. Solaris and GlusterFS builds one Storage unit.
> The GlusterFS Server export the final Volume to the Clients as well KVM,
>  VMWare, Hyper-V etc.
> You can use native GlusterFS, RDMA, NFS ore CIFS to export the Volume.
> SRP have nothing to do with VmWare.
>
> When you are use a 7200 SAS drive the access time are the same as a
> SATA drive only the quality of the hardware are better. When you need
> Performance you must use SAS drives with 15000U/m. But it's not needed
> when you install SSD for ZIL/L2ARC.  ZeusRAM rocks :-)
>
> I use dedup only at secondary stroage or on Backupserver not on Primary
> Storage.
> When you use SSD   SATA drives then you have an cheap an fast strorage.
> 1TB drive cost unter 100$. Current i'm not need to save storage volume.
>
> Which application use atime? I know "find -atime N". atime on a Storage
> that stores only Virtual Disks?
> I don't need the information when i have last opening the disk :-)
>
> For a Solaris GUI take a look at napp-it
> http://www.napp-it.org/
>
> greetings from germany
> Andreas
>
>
>
>
> ----- Ursprüngliche Mail -----
>
> Von: "Fabrice Brazier" <fabrice.brazier@apalia.net>
> An: cloudstack-users@incubator.apache.org
> Gesendet: Dienstag, 23. Oktober 2012 12:30:50
> Betreff: RE: Primary Storage
>
> Hi Andreas,
>
> Hum that's pretty cool, i know they have still trouble with infiniband on
> nexenta but it's clearly a priority on their roadmap (plus i trust more the
> nexenta team to evolve zfs than oracle).
>
> I agree iscsi over infiniband increase the latency but most of the time
> it's
> just more simple to use IPOIB than IB, for example if you use vmware the
> support of infiniband it's assured by mellanox and not vmware so if you
> have
> an issue the vmware support probably don't help you anymore.
>
> About the raid i'm a fan of raid 10, i prefer build a raid 10 with 7200 sas
> drive than multiple vdev raidz/raidz2 15k sas drive. Particularly for
> virtual environment with a random workload and multiple access.
>
> I'm a fan of nfs so agreed about the zil, and zeus ram are the only one for
> that (with classical ssd you can hit the write hole).
>
> Agreed for compression too (but only lzjb level gzip use too much cpu),
> disable atime permit to decrease the iops load but i'm not sure is really a
> best practice. About the deduplication I don't totally agree. It really
> depends of your array, the workload and the VM type on your cloudstack.
> Actually when i build a zfs array i count 1Gb ram for 1Tb disks. With
> deduplication i count 2Gb ram for 1 Tb disks (plus in nexenta 4 they will
> add a possibility to use ssd for the deduplication metada, like the l2arc
> for the read cache).
>
> The last point it's about your drives, why sata drives? I mean sata doesn't
> respect their latency, and the reliability of a sas drive is 10x the sata.
> Plus now you can found many NL-sas drive at low cost
>
> But it's really a nice architecture, i never try glusterfs (for the moment)
> plus in that case it's really a good way to have a replacement to a
> metro-cluster for free, i try one of is competitor (onefs) and clearly
> clustered filesystem are the futur.
>
> Cheers,
> Fabrice
>
> -----Message d'origine-----
> De : Andreas Huser [mailto:ahuser@7five-edv.de]
> Envoyé : mardi 23 octobre 2012 11:37
> À : cloudstack-users@incubator.apache.org
> Objet : Re: Primary Storage
>
> Hi Fabrice,
>
> I don't know what other people do but i have no problems with Infiniband +
> GlusterFS + Cloudstack I'm not use Nexenta it's based on Illumos and work
> not fine with Infiniband.
> I have two different clusters in productiv enviroments.
>
> The first: Solaris 11 with built-in GlusterFS 3.3 export Gluster Vol. with
> RDMA. > performance is okey you can use that for smaller enviroments The
> Second: Is a little bit complex with a GlusterFS Server in the middle
>
> ZFS Server: based on Solaris 11
> 1.) Create a zpool min. two vdevs and SSD read/write cache
> 2.) Create a thin provisioning Volume "zfs create -V" disable atime and
> enable compression (do not enable dedup!) and export as (iWarp) SRP Target
> to the GlusterFS Server use a direct connection without IB Switch.
>
> GlusterFS Server:
> 1.) Use RedHat, CentOS or Fedora (I use CentOS 5 and 6)
> 2.) Use OFED driver from https://www.openfabrics.org
> 3.) Import the SRP Target from ZFS Server and format as xfs
> 4.) Create a Glustervolume "volume create xy transport rdma " (use only
> rdma)
> 5.) Connect with the second IB Port to a IB Switch
>
> Cloudstack Hypervisor Node:
> 1.) Use RedHat, CentOS or Fedora (I use CentOS 5 and 6)
> 2.) Use OFED driver from https://www.openfabrics.org
> 3.) Import the Glustervolume
>
> ZFS Thin Volume ---- Infinband SRP ----> GlusterFS ---- GFSVol rdma ---->
> IB Switch ----> Clients
>
> The ZFS and GlusterFS Server form a Storage unit connect direct with 40Gbit
> Infiniband Point-to-Point You do not feel that is a cable between!
>
> Important: When you have Infiniband use not IPoIB with iSCSI! If one
> already
> has Infiniband then you should also use the advantage.
> IPoIB have a higher Latency as iWarp SRP!
>
>
> SRP = usec
>
> -- SRP --
> local address: LID 0x01 QPN 0x44004b PSN 0xf3265b RKey 0x9804237c VAddr
> 0x00000001dda000 remote address: LID 0x0a QPN 0x10004a PSN 0x44072e RKey
> 0x1c0f115 VAddr 0x000000088e6000
> ------------------------------------------------------------------
> #bytes #iterations t_min[usec] t_max[usec] t_typical[usec]
> 2 1000 1.29 125.30 1.31
> ------------------------------------------------------------------
>
> -- IPoIB ---
> [root@sv01sfogaa ~]# ping 10.200.0.10
> PING 10.200.0.10 (10.200.0.10) 56(84) bytes of data.
> 64 bytes from 10.200.0.10: icmp_seq=1 ttl=255 time=0.147 ms
> 64 bytes from 10.200.0.10: icmp_seq=2 ttl=255 time=0.116 ms
>
> When you put load on IPoIB then increases the latency that is not good
>
>
> That is my recommendation for a simple GlusterFS Mirror:
>
> - Supermicro Server with Intel Hardware and Expander Backplane
> - 1x Crucal-M4 SSD Read Cache
> - 2x ZeusIOPs SSD Write cache (mirror)
> - SATA 24/7 Harddrives
> - LSI HBA 9207 or 9211
> - ConnectX-2 QDR Dualport Infiniband Adaper (HP refurbished with full
> warrany for 100$) Importent > Flash newest Firmware from Mellanox!
> - Mellanox IB Swich
> - Solaris 11
> - GlusterFS 3.3 compiled with ib_verbs
> - Gluster Volume transport only rdma
>
>
>
>
> <blockquote>
> Throughput constant up to 200 Mbyte/s
> more throughput with more storage servers or more harddrives on jbod
>
>
> </blockquote>
>
>
> Info:
>
> - I have some problems with infiniband RDMA or SRP with OpenIndiana or
> Illumos or Nexenta. Some adapters have a high latency or not a stable
> connection. Use Solaris that's the right way!
> - OpenIndiana is Beta! Infiniband ib_verbs works not or not fine!
> - Use Solaris 11 Infiniband ib_verbs are native and stable
> - Don't use Ubuntu Client Server for Infiniband! Use RedHat, Fedora or
> CentOS and install the right drivers from
> https://www.openfabrics.org/downloads/OFED/
> - You have not SSD Cache? Disable at the ZFS Volume the sync! Important!
> You
> loose Security for your Data but some protocols use sync flags in
> transport.
> Example NFS use by default fsync. write cache are not active. NFS writes
> data direct to the Harddrive. For Data Security and Performance give the
> Storage Server a SSD write cache. ZFS works at default with sync=standard
> that prevent write holes. (COW System)
>
> I hope that I could help a little
>
> Greeting from Germany
> Andreas
>
>
>
> ----- Ursprüngliche Mail -----
>
> Von: "Fabrice Brazier" <fabrice.brazier@apalia.net>
> An: cloudstack-users@incubator.apache.org
> Gesendet: Dienstag, 23. Oktober 2012 09:55:15
> Betreff: RE: Primary Storage
>
> Hi Andreas,
>
> Hello i just see your configuration, it seems quite interesting.
> If i understand well you want to build some zfs array on the backend.
> Export luns (probably by iscsi over infiniband) to you linux cluster, and
> on
> the linux cluster you put glusterFS.
> I can understand the point, with that you can have very good performance
> and
> reliability (zfs), scalability and redundancy (gluster) for very low cost.
> So just one question, did you try the global namespace implementation from
> nexenta?
> If yes can you tell me what configuration is the best for you?
> I mean the fact you have a gluster cluster in the middle must impact the
> overral performance no?
>
> Fabrice
>
> -----Message d'origine-----
> De : Andreas Huser [mailto:ahuser@7five-edv.de] Envoyé : mardi 23 octobre
> 2012 05:40 À : cloudstack-users@incubator.apache.org
> Objet : Re: Primary Storage
>
> Hi,
>
> for Cloudstack i use Solaris 11 ZFS + GlusterFS over Infiniband (RDMA).
> That
> gives the best performance and most scalable Storage.
> I have tasted some different solutions for primary Storage but the most are
> to expensive and for a CloudStack Cluster not economic or have a poor
> performance.
>
> My Configuration:
> Storage Node:
> Supermicro Server (Intel Hardware) with Solaris 11 with SSD write and read
> cache (read crucial-m4, write ZeusIOPS) GlusterFS and dualport ConnectX
> 40Gbit/s Infiniband adapter.
>
> I have installed GlusterFS direct on Solaris with a modified code.
> Want you build bigger systems for more then 50 VMs it is better you split
> the Solaris and GlusterFS with a separte headnode for GlusterFS
>
> That looks like:
> Solaris ZFS Backendstorage with a dataset Volume (Thin Provision) --> ( SRP
> Target attached direct without Infiniband switch to GF Node) --> GlusterFS
> Node the srp target formatted with xfs filesystem, create a GlusterFS
> Volume --> ( Infiniband over a Mellanox Port Switch) --> Cloudstack Node
> mount glusterFS Volume over RDMA
>
> For the Dataset Volume at the ZFS Storage, disable atime and enable
> compression.
> (Space reclaim) With compression you can shrink the ZFS Volume with command
> at Linux dd /dev/zero or In a Windows VM with sdelete That gives you space
> left on the Primary Storage for deleted Files in a VM or for deleted vhd's
> or vm's in the cloudstack
>
> greeting Andreas
>
>
>
>
> Mit freundlichen Grüßen
>
> Andreas Huser
> Geschäftsführer
> System Engineer / Consultant
> (Cisco CSE, SMBAM, LCSE, ASAM)
> ---------------------------------------
> Zellerstraße 28 - 77654 Offenburg
> Tel: +49(781) 12786898
> Mobil: +49(176) 10308549
> ahuser@7five-edv.de
>
>
>
>
> ----- Ursprüngliche Mail -----
>
> Von: "Outback Dingo" <outbackdingo@gmail.com>
> An: cloudstack-users@incubator.apache.org
> Gesendet: Dienstag, 23. Oktober 2012 02:15:16
> Betreff: Re: Primary Storage
>
> On Mon, Oct 22, 2012 at 8:09 PM, Ivan Rodriguez <ivanoch@gmail.com> wrote:
>
> <blockquote>
> Solaris 11 ZFS and yes we tried different setups, raids levels number
> of SSD cache, ARC zfs options etc etc etc.
>
> Cheers
>
>
> VMWare ??
> </blockquote>
>
>
>
> <blockquote>
> On Tue, Oct 23, 2012 at 11:05 AM, Outback Dingo
> <outbackdingo@gmail.com>wrote:
>
>
> <blockquote>
> On Mon, Oct 22, 2012 at 8:03 PM, Ivan Rodriguez <ivanoch@gmail.com>
> wrote:
> > We are using ZFS, with jbod, not in production yet exporting NFS to
> > cloudstack, I'm not really happy about the performance
> > but I think is related to the hardware itself rather than technology,
> > we
> > are using intel SR2625UR and Intel 320 SSD, we were evaluating gluster
> > as
> > well, but we decided to move away from that path since gluster nfs is
> still
> > performing poorly, plus we would like to see cloudstack integrating the
> > gluster-fuse module, we haven't decided the final storage setup but at
> the
> > moment we had better results with ZFS.
> >
> >
>
> question is whos ZFS and have you "tweaked" the zfs / nfs config for
> performance
>
> >
> > On Tue, Oct 23, 2012 at 10:44 AM, Nik Martin <nik.martin@nfinausa.com
> >wrote:
> >
> >> On 10/22/2012 05:49 PM, Trevor Francis wrote:
> >>
> >>> ZFS looks really interesting to me and I am leaning that way. I am
> >>> considering using FreeNAS, as people seem to be having good luck with
> >>> it. Can anyone weigh in here?
> >>>
> >>>
> >> My personal opinion, I think FreeNAS and OpenFiler have horrible,
> horrible
> >> User Interfaces - not very intuitive, and they both seem to be file
> servers
> >> with things like iSCSI targets tacked on as an afterthought.
> >>
> >> Nik
> >>
> >>
> >>> Trevor Francis
> >>> Partner
> >>> 46 Labs | The PeerEdge Cloud
> >>> http://www.46labs.com <http://www.46labs.com/> |
> http://www.peeredge.net
> >>> <http://www.peeredge.net/>
> >>> 405-362-0046 - Voice | 405-410-4980 - Cell
> >>> trevorgfrancis - Skype
> >>> trevor@46labs.com <mailto:trevor@46labs.com>
> >>> Solutions Provider for the Telecom Industry
> >>>
> >>> <http://www.twitter.com/**peeredge
> >>> <http://www.twitter.com/peeredge>><
> >>> http://www.twitter.**com/peeredge <http://www.twitter.com/peeredge>><
> >>> http://www.**twitter.com/peeredge <http://www.twitter.com/peeredge>><
> >>> http://**www.facebook.com/PeerEdge
> >>> <http://www.facebook.com/PeerEdge>>
> >>>
> >>> On Oct 22, 2012, at 2:30 PM, Jason Davis wrote:
> >>>
> >>> ZFS would be an interesting setup as you can do the cache pools like
> you
> >>>> would do in CacheCade. The problem with ZFS or CacheCade+DRBD is
> >>>> that
> >>>> they
> >>>> really don't scale out well if you are looking for something with a
> >>>> unified
> >>>> name space. I'll say however that ZFS is a battle hardened FS with
> tons
> >>>> of
> >>>> shops using it. A lot of the whiz-bang SSD+SATA disk SAN things
> >>>> these
> >>>> smaller start up companies are hocking are just ZFS appliances.
> >>>>
> >>>> RBD looks interesting but I'm not sure if I would be willing to put
> >>>> production data on it, I'm not sure how performant it is IRL. From a
> >>>> purely technical perspective, it looks REALLY cool.
> >>>>
> >>>> I suppose anything is fast if you put SSDs in it :) GlusterFS is
> another
> >>>> option although historically small/random IO has not been it's
> >>>> strong
> >>>> point.
> >>>>
> >>>> If you are ok spending money on software and want a scale out block
> >>>> storage
> >>>> then you might want to consider HP LeftHand's VSA product. I am
> >>>> personally
> >>>> partial to NFS plays:) I went the exact opposite approach and
> >>>> settled
> on
> >>>> Isilon for our primary storage for our CS deployment.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Oct 22, 2012 at 10:24 AM, Nik Martin
> >>>> <nik.martin@nfinausa.com
> >>>> <mailto:nik.martin@nfinausa.**com <nik.martin@nfinausa.com>>>wrote:
> >>>>
> >>>> On 10/22/2012 10:16 AM, Trevor Francis wrote:
> >>>>>
> >>>>> We are looking at building a Primary Storage solution for an
> >>>>>> enterprise/carrier class application. However, we want to build
it
> >>>>>> using
> >>>>>> a FOSS solution and not a commercial solution. Do you have a
> >>>>>> recommendation on platform?
> >>>>>>
> >>>>>>
> >>>>>> Trevor,
> >>>>>
> >>>>> I got EXCELLENT results builing a SAN from FOSS using:
> >>>>> OS: Centos
> >>>>> Hardware: 2X storage servers, with 12x2TB 3.5 SATA drives. LSI
> MegaRAID
> >>>>> with CacheCade Pro, with 240 GB Intel 520 SSDs configured to do
SSD
> >>>>> caching
> >>>>> (alternately, look at FlashCache from Facebook)
> >>>>> intel 10GB dual port nics, one port for crossover, on port for up
> link
> >>>>> to
> >>>>> storage network
> >>>>>
> >>>>> DRBD for real time block replication to active-active
> >>>>> Pacemaker+corosync for HA Resource management
> >>>>> tgtd for iSCSI target
> >>>>>
> >>>>> If you want file backed storage, XFS is a very good filesystem on
> Linux
> >>>>> now.
> >>>>>
> >>>>> Pacemaker+Corosync can be difficult to grok at the beginning, but
> that
> >>>>> setup gave me a VERY high performance SAN. The downside is it is
> >>>>> entirely
> >>>>> managed by CLI, no UI whatsoever.
> >>>>>
> >>>>>
> >>>>> Trevor Francis
> >>>>>> Partner
> >>>>>> 46 Labs | The PeerEdge Cloud
> >>>>>> http://www.46labs.com <http://www.46labs.com/> |
> >>>>>> http://www.peeredge.net
> >>>>>> <http://www.peeredge.net/>
> >>>>>>
> >>>>>> 405-362-0046 - Voice | 405-410-4980 - Cell
> >>>>>> trevorgfrancis - Skype
> >>>>>> trevor@46labs.com <mailto:trevor@46labs.com> <mailto:
> trevor@46labs.com
> >>>>>> >
> >>>>>>
> >>>>>>
> >>>>>> Solutions Provider for the Telecom Industry
> >>>>>>
> >>>>>> <http://www.twitter.com/****peeredge<
> http://www.twitter.com/**peeredge><
> >>>>>> http://www.twitter.com/**peeredge <http://www.twitter.com/peeredge
> >>><
> >>>>>> http://www.twitter.**com/**peeredge <
> http://www.twitter.com/**peeredge<http://www.twitter.com/peeredge>
> >>>>>> >><
> >>>>>> http://www.**twitter.com/**peeredge <http://twitter.com/peeredge>
> >>>>>> <
> >>>>>> http://www.twitter.com/**peeredge <http://www.twitter.com/peeredge
> >>><
> >>>>>> http://**www.facebook.com/**PeerEdge<
> http://www.facebook.com/PeerEdge><
> >>>>>> http://www.facebook.com/**PeerEdge <
> http://www.facebook.com/PeerEdge>
> >>>>>> >>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
>
>
> </blockquote>
>
>
> </blockquote>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message