cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Davis <scr...@gmail.com>
Subject Re: Primary Storage
Date Wed, 24 Oct 2012 00:17:10 GMT
How often do you have folks cache missing and hitting the spinning rust
below the cache cade tier?
On Oct 23, 2012 6:25 PM, <bruce.m@v365.com.au> wrote:

>
>
> Hi
>
> with our SCST RDMA we have 40GB/s QDR Gen 2 with Mellanox Grid
> Director 4036 switches + HP Gen2 QDR Chassis Switches.
>
> Our San achieves
> 295,000 - 350,000 Iops max per LSI controller, depending on block size
> for i/o meter test. LSI9280-8i with battery cache. + cachecade2.0 x 2 in
> PCIE 16x slot on supermicro motherboard.
>
> 1 x 250GB SSD for CACHE Cade +
> 16 1tb WD VR drives (16 - hot stand buy - 2 for raid 6 parity) =
> 2600MB/s
>
> most of our test are from 4K to 128K block size, @ 4K we get
> 295,000 Iops @ 1590 MB/s and @ 128K 350,000 iops @ 2400~2600MB/s
>
> we
> have tuned 64K and 128K block size on different luns. 64K for database
> and 128K for general file.
>
> The best thing about Infiniband is low CPU
> cycles. only 5~6% during these tests.
>
> latency is as low as 4-6ms
> average read time. concurrent response times are from 5-12ms. even under
> heavy load its below 20ms
>
> Infiniband latency is below 0.01 us Which is
> why we chose it.
>
> we run MSSQL on the 64K formatted Luns, its massively
> fast.
>
> If we copy the same data twice the cache cade kicks in and we
> achieve even better speeds.
>
> Ive compared this to block io @ 8Gb/s Fiber
> channel, and it barely gets 120,000 IOPS. @ much higher latency and
> bandwidth.
>
> So for our money RDMA wins!
>
> Bruce
>
> On 23.10.2012 21:20,
> Jason Davis wrote:
> > Out of curiosity, is there any quick performance
> numbers for these ZFS +
> > GlusterFS mashups you guys are talking
> about?
> >
> > Specifically, IOPs and latency? Sequential read/write
> performance honestly
> > isn't a very good benchmark to determine your
> SANs performance. It's like
> > comparing CPUs based solely on how many
> GHz it runs at. Sure you can get
> > great MB or GB/s with SATA disk but
> I'd reckon that IOP performance is
> > abismal. If you are utilizing
> GlusterFS without the cache pooling magic
> > that is ZFS then I would
> imagine that latency can be an issue.
> >
> >
> >
> > On Tue, Oct 23, 2012 at
> 7:56 AM, Andreas Huser wrote:
> >
> >> Hi Fabrice,
> >>
> >> i know
> OpenSolaris/Solaris Oracle it's so a thing.
> >> I'm for more then 10
> years a open source user and that
> >> with oracle - i did no like at the
> beginning of this constallation.
> >> But Oracle makes his work good i
> know that. The cost of one socket
> >> are 700$ and you can use so much
> quantity of TB as you will.
> >> And you can use the full premier Support
> from Oracle.
> >> Nexenta develop with the Illumos code. And the Licence
> are TB based.
> >> That is not my favorite. As well the pool version from
> Nexenta comes
> >> not after. Current Nexenta Infiniband are not a usable
> solution.
> >> But every can use what he will. Everyone must decide for
> themselves.
> >>
> >> SRP Targets or iser are not difficult to configure.
> Use the SRP for
> >> the Storage unit connection. Solaris and GlusterFS
> builds one Storage unit.
> >> The GlusterFS Server export the final Volume
> to the Clients as well KVM,
> >> VMWare, Hyper-V etc.
> >> You can use
> native GlusterFS, RDMA, NFS ore CIFS to export the Volume.
> >> SRP have
> nothing to do with VmWare.
> >>
> >> When you are use a 7200 SAS drive the
> access time are the same as a
> >> SATA drive only the quality of the
> hardware are better. When you need
> >> Performance you must use SAS
> drives with 15000U/m. But it's not needed
> >> when you install SSD for
> ZIL/L2ARC. ZeusRAM rocks :-)
> >>
> >> I use dedup only at secondary stroage
> or on Backupserver not on Primary
> >> Storage.
> >> When you use SSD SATA
> drives then you have an cheap an fast strorage.
> >> 1TB drive cost unter
> 100$. Current i'm not need to save storage volume.
> >>
> >> Which
> application use atime? I know "find -atime N". atime on a Storage
> >>
> that stores only Virtual Disks?
> >> I don't need the information when i
> have last opening the disk :-)
> >>
> >> For a Solaris GUI take a look at
> napp-it
> >> http://www.napp-it.org/
> >>
> >> greetings from germany
> >>
> Andreas
> >>
> >>
> >>
> >>
> >> ----- Ursprüngliche Mail -----
> >>
> >> Von:
> "Fabrice Brazier">> An: cloudstack-users@incubator.apache.org
> >>
> Gesendet: Dienstag, 23. Oktober 2012 12:30:50
> >> Betreff: RE: Primary
> Storage
> >>
> >> Hi Andreas,
> >>
> >> Hum that's pretty cool, i know they have
> still trouble with infiniband on
> >> nexenta but it's clearly a priority
> on their roadmap (plus i trust more the
> >> nexenta team to evolve zfs
> than oracle).
> >>
> >> I agree iscsi over infiniband increase the latency
> but most of the time
> >> it's
> >> just more simple to use IPOIB than IB,
> for example if you use vmware the
> >> support of infiniband it's assured
> by mellanox and not vmware so if you
> >> have
> >> an issue the vmware
> support probably don't help you anymore.
> >>
> >> About the raid i'm a fan
> of raid 10, i prefer build a raid 10 with 7200 sas
> >> drive than
> multiple vdev raidz/raidz2 15k sas drive. Particularly for
> >> virtual
> environment with a random workload and multiple access.
> >>
> >> I'm a fan
> of nfs so agreed about the zil, and zeus ram are the only one for
> >>
> that (with classical ssd you can hit the write hole).
> >>
> >> Agreed for
> compression too (but only lzjb level gzip use too much cpu),
> >> disable
> atime permit to decrease the iops load but i'm not sure is really a
> >>
> best practice. About the deduplication I don't totally agree. It
> really
> >> depends of your array, the workload and the VM type on your
> cloudstack.
> >> Actually when i build a zfs array i count 1Gb ram for 1Tb
> disks. With
> >> deduplication i count 2Gb ram for 1 Tb disks (plus in
> nexenta 4 they will
> >> add a possibility to use ssd for the
> deduplication metada, like the l2arc
> >> for the read cache).
> >>
> >> The
> last point it's about your drives, why sata drives? I mean sata
> doesn't
> >> respect their latency, and the reliability of a sas drive is
> 10x the sata.
> >> Plus now you can found many NL-sas drive at low
> cost
> >>
> >> But it's really a nice architecture, i never try glusterfs
> (for the moment)
> >> plus in that case it's really a good way to have a
> replacement to a
> >> metro-cluster for free, i try one of is competitor
> (onefs) and clearly
> >> clustered filesystem are the futur.
> >>
> >>
> Cheers,
> >> Fabrice
> >>
> >> -----Message d'origine-----
> >> De : Andreas
> Huser [mailto:ahuser@7five-edv.de]
> >> Envoyé : mardi 23 octobre 2012
> 11:37
> >> À : cloudstack-users@incubator.apache.org
> >> Objet : Re:
> Primary Storage
> >>
> >> Hi Fabrice,
> >>
> >> I don't know what other people
> do but i have no problems with Infiniband +
> >> GlusterFS + Cloudstack
> I'm not use Nexenta it's based on Illumos and work
> >> not fine with
> Infiniband.
> >> I have two different clusters in productiv
> enviroments.
> >>
> >> The first: Solaris 11 with built-in GlusterFS 3.3
> export Gluster Vol. with
> >> RDMA. > performance is okey you can use that
> for smaller enviroments The
> >> Second: Is a little bit complex with a
> GlusterFS Server in the middle
> >>
> >> ZFS Server: based on Solaris 11
> >>
> 1.) Create a zpool min. two vdevs and SSD read/write cache
> >> 2.) Create
> a thin provisioning Volume "zfs create -V" disable atime and
> >> enable
> compression (do not enable dedup!) and export as (iWarp) SRP Target
> >>
> to the GlusterFS Server use a direct connection without IB Switch.
> >>
> >>
> GlusterFS Server:
> >> 1.) Use RedHat, CentOS or Fedora (I use CentOS 5
> and 6)
> >> 2.) Use OFED driver from https://www.openfabrics.org
> >> 3.)
> Import the SRP Target from ZFS Server and format as xfs
> >> 4.) Create a
> Glustervolume "volume create xy transport rdma " (use only
> >> rdma)
> >>
> 5.) Connect with the second IB Port to a IB Switch
> >>
> >> Cloudstack
> Hypervisor Node:
> >> 1.) Use RedHat, CentOS or Fedora (I use CentOS 5 and
> 6)
> >> 2.) Use OFED driver from https://www.openfabrics.org
> >> 3.) Import
> the Glustervolume
> >>
> >> ZFS Thin Volume ---- Infinband SRP ---->
> GlusterFS ---- GFSVol rdma ---->
> >> IB Switch ----> Clients
> >>
> >> The
> ZFS and GlusterFS Server form a Storage unit connect direct with
> 40Gbit
> >> Infiniband Point-to-Point You do not feel that is a cable
> between!
> >>
> >> Important: When you have Infiniband use not IPoIB with
> iSCSI! If one
> >> already
> >> has Infiniband then you should also use the
> advantage.
> >> IPoIB have a higher Latency as iWarp SRP!
> >>
> >>
> >> SRP =
> usec
> >>
> >> -- SRP --
> >> local address: LID 0x01 QPN 0x44004b PSN
> 0xf3265b RKey 0x9804237c VAddr
> >> 0x00000001dda000 remote address: LID
> 0x0a QPN 0x10004a PSN 0x44072e RKey
> >> 0x1c0f115 VAddr
> 0x000000088e6000
> >>
> ------------------------------------------------------------------
> >>
> #bytes #iterations t_min[usec] t_max[usec] t_typical[usec]
> >> 2 1000
> 1.29 125.30 1.31
> >>
> ------------------------------------------------------------------
> >>
> >>
> -- IPoIB ---
> >> [root@sv01sfogaa ~]# ping 10.200.0.10
> >> PING
> 10.200.0.10 (10.200.0.10) 56(84) bytes of data.
> >> 64 bytes from
> 10.200.0.10: icmp_seq=1 ttl=255 time=0.147 ms
> >> 64 bytes from
> 10.200.0.10: icmp_seq=2 ttl=255 time=0.116 ms
> >>
> >> When you put load on
> IPoIB then increases the latency that is not good
> >>
> >>
> >> That is my
> recommendation for a simple GlusterFS Mirror:
> >>
> >> - Supermicro Server
> with Intel Hardware and Expander Backplane
> >> - 1x Crucal-M4 SSD Read
> Cache
> >> - 2x ZeusIOPs SSD Write cache (mirror)
> >> - SATA 24/7
> Harddrives
> >> - LSI HBA 9207 or 9211
> >> - ConnectX-2 QDR Dualport
> Infiniband Adaper (HP refurbished with full
> >> warrany for 100$)
> Importent > Flash newest Firmware from Mellanox!
> >> - Mellanox IB
> Swich
> >> - Solaris 11
> >> - GlusterFS 3.3 compiled with ib_verbs
> >> -
> Gluster Volume transport only rdma
> >>
> >>
> >>
> >>
> >>
>
> >>> Throughput
> constant up to 200 Mbyte/s >> more throughput with more storage servers
> or more harddrives on jbod >> >> >>
>
> >>
> >>
> >> Info:
> >>
> >> - I have some
> problems with infiniband RDMA or SRP with OpenIndiana or
> >> Illumos or
> Nexenta. Some adapters have a high latency or not a stable
> >>
> connection. Use Solaris that's the right way!
> >> - OpenIndiana is Beta!
> Infiniband ib_verbs works not or not fine!
> >> - Use Solaris 11
> Infiniband ib_verbs are native and stable
> >> - Don't use Ubuntu Client
> Server for Infiniband! Use RedHat, Fedora or
> >> CentOS and install the
> right drivers from
> >> https://www.openfabrics.org/downloads/OFED/
> >> -
> You have not SSD Cache? Disable at the ZFS Volume the sync!
> Important!
> >> You
> >> loose Security for your Data but some protocols use
> sync flags in
> >> transport.
> >> Example NFS use by default fsync. write
> cache are not active. NFS writes
> >> data direct to the Harddrive. For
> Data Security and Performance give the
> >> Storage Server a SSD write
> cache. ZFS works at default with sync=standard
> >> that prevent write
> holes. (COW System)
> >>
> >> I hope that I could help a little
> >>
> >>
> Greeting from Germany
> >> Andreas
> >>
> >>
> >>
> >> ----- Ursprüngliche Mail
> -----
> >>
> >> Von: "Fabrice Brazier"
> >> An:
> cloudstack-users@incubator.apache.org
> >> Gesendet: Dienstag, 23. Oktober
> 2012 09:55:15
> >> Betreff: RE: Primary Storage
> >>
> >> Hi Andreas,
> >>
> >>
> Hello i just see your configuration, it seems quite interesting.
> >> If i
> understand well you want to build some zfs array on the backend.
> >>
> Export luns (probably by iscsi over infiniband) to you linux cluster,
> and
> >> on
> >> the linux cluster you put glusterFS.
> >> I can understand
> the point, with that you can have very good performance
> >> and
> >>
> reliability (zfs), scalability and redundancy (gluster) for very low
> cost.
> >> So just one question, did you try the global namespace
> implementation from
> >> nexenta?
> >> If yes can you tell me what
> configuration is the best for you?
> >> I mean the fact you have a gluster
> cluster in the middle must impact the
> >> overral performance no?
> >>
> >>
> Fabrice
> >>
> >> -----Message d'origine-----
> >> De : Andreas Huser
> [mailto:ahuser@7five-edv.de] Envoyé : mardi 23 octobre
> >> 2012 05:40 À :
> cloudstack-users@incubator.apache.org
> >> Objet : Re: Primary
> Storage
> >>
> >> Hi,
> >>
> >> for Cloudstack i use Solaris 11 ZFS + GlusterFS
> over Infiniband (RDMA).
> >> That
> >> gives the best performance and most
> scalable Storage.
> >> I have tasted some different solutions for primary
> Storage but the most are
> >> to expensive and for a CloudStack Cluster
> not economic or have a poor
> >> performance.
> >>
> >> My Configuration:
> >>
> Storage Node:
> >> Supermicro Server (Intel Hardware) with Solaris 11 with
> SSD write and read
> >> cache (read crucial-m4, write ZeusIOPS) GlusterFS
> and dualport ConnectX
> >> 40Gbit/s Infiniband adapter.
> >>
> >> I have
> installed GlusterFS direct on Solaris with a modified code.
> >> Want you
> build bigger systems for more then 50 VMs it is better you split
> >> the
> Solaris and GlusterFS with a separte headnode for GlusterFS
> >>
> >> That
> looks like:
> >> Solaris ZFS Backendstorage with a dataset Volume (Thin
> Provision) --> ( SRP
> >> Target attached direct without Infiniband switch
> to GF Node) --> GlusterFS
> >> Node the srp target formatted with xfs
> filesystem, create a GlusterFS
> >> Volume --> ( Infiniband over a
> Mellanox Port Switch) --> Cloudstack Node
> >> mount glusterFS Volume over
> RDMA
> >>
> >> For the Dataset Volume at the ZFS Storage, disable atime and
> enable
> >> compression.
> >> (Space reclaim) With compression you can
> shrink the ZFS Volume with command
> >> at Linux dd /dev/zero or In a
> Windows VM with sdelete That gives you space
> >> left on the Primary
> Storage for deleted Files in a VM or for deleted vhd's
> >> or vm's in the
> cloudstack
> >>
> >> greeting Andreas
> >>
> >>
> >>
> >>
> >> Mit freundlichen
> Grüßen
> >>
> >> Andreas Huser
> >> Geschäftsführer
> >> System Engineer /
> Consultant
> >> (Cisco CSE, SMBAM, LCSE, ASAM)
> >>
> ---------------------------------------
> >> Zellerstraße 28 - 77654
> Offenburg
> >> Tel: +49(781) 12786898
> >> Mobil: +49(176) 10308549
> >>
> ahuser@7five-edv.de
> >>
> >>
> >>
> >>
> >> ----- Ursprüngliche Mail -----
> >>
> >>
> Von: "Outback Dingo"
> >> An: cloudstack-users@incubator.apache.org
> >>
> Gesendet: Dienstag, 23. Oktober 2012 02:15:16
> >> Betreff: Re: Primary
> Storage
> >>
> >> On Mon, Oct 22, 2012 at 8:09 PM, Ivan
> Rodriguezwrote:
> >>
> >>
>
> >>> Solaris 11 ZFS and yes we tried different
> setups, raids levels number >> of SSD cache, ARC zfs options etc etc
> etc. >> >> Cheers >> >> >> VMWare ?? >>
>
> >>
> >>
> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message