cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bruc...@v365.com.au
Subject Re: Primary Storage - All those NEGATIVE TROLLS SHUT UP!
Date Wed, 24 Oct 2012 01:09:51 GMT
  

oh well if your negative, and make rude comments then that's your
problem, i don't care about you.

If you don't believe that RDMA can
perform this fast then again, your problem.

Ask anyone with a decent
12+ Disc Raid Array, and RDMA and they will tell you it will kick you in
YOUR ballz! Anyone in storage knows to get iops and performance you need
12 preferably 16 spindles.

the spinning rust as you call it, are the
newest 200MB/s WD Veloci Raprors with 64MB Cache. @ 10KRPM 2.5" dics in
32 Bay Storage Chassis.

we've used them in the past and they are as
reliable as the SAS drives we use, but faster!

RAID 10 IS BULLSHIT!
Bloody slow and 50% goes up in smoke for nothing. All you get is the
mirror speed of 2 drives which is barley 400MB/s tops! I Spit on RAID 10


We also tried Raid 60 and it was very good as well. But no point to
use it as we replicated to 2nd SAN

When we tested our new SANs, we
failed 1 drive and set the rebuild rate to 30% on raid 6, with 40% scan
rate. The SAN only had 6TB of data on it. and it rebuilt in 3 hours. we
removed a 2nd disc, and replaced with blank, and it was rebuild in
another 4 hours. With no real impact to performance tests.

Separating
the underlying 6Gb/s Sata ports x 8 into effectively 16 x 3Gb/s Channels
using a SAS Expander gives each Disc 300MB/s of bandwidth. They cant
physically perform better than 200~220MB/s so there is enough bandwidth
on the SATA and PCIE Bus to cope. LSI rate the 9280-8i @ 2500MB/s but it
does work faster with Cache Cade. up to 3200MB/s in that test.

So a
real SAN has many spindles of high performance, and the WD VRs are
better than some cheap SSD drives.

RDMA is very fast, low cpu cycles,
and reads and write directly to RAM@ 40Gb/s, we created a RAM drive and
tested it up to 3200MB/s which is as fast as the PCIE 2 Bus / LSI could
Handle.

If you cant afford this realtivley cheap equipment (compated to
IBM DS or HP StorageWorks SAN) then dont wing at ME. You cant get fast
performance. Any real Cloud would have what we have built. None of our
customers, ALL 300 Of them! have lost a single bit of data, all VMs are
running very quickly, and with no problems now for 3 months. 

So If you
dont appreciate being shown how to build a kick ASS SAN then PISS
OFF!

On 24.10.2012 08:17, Jason Davis wrote:
> How often do you have
folks cache missing and hitting the spinning rust
> below the cache cade
tier?
> On Oct 23, 2012 6:25 PM, wrote:
> 
>>
>>
>> Hi
>>
>> with our
SCST RDMA we have 40GB/s QDR Gen 2 with Mellanox Grid
>> Director 4036
switches + HP Gen2 QDR Chassis Switches.
>>
>> Our San achieves
>>
295,000 - 350,000 Iops max per LSI controller, depending on block
size
>> for i/o meter test. LSI9280-8i with battery cache. +
cachecade2.0 x 2 in
>> PCIE 16x slot on supermicro motherboard.
>>
>> 1
x 250GB SSD for CACHE Cade +
>> 16 1tb WD VR drives (16 - hot stand buy
- 2 for raid 6 parity) =
>> 2600MB/s
>>
>> most of our test are from 4K
to 128K block size, @ 4K we get
>> 295,000 Iops @ 1590 MB/s and @ 128K
350,000 iops @ 2400~2600MB/s
>>
>> we
>> have tuned 64K and 128K block
size on different luns. 64K for database
>> and 128K for general
file.
>>
>> The best thing about Infiniband is low CPU
>> cycles. only
5~6% during these tests.
>>
>> latency is as low as 4-6ms
>> average
read time. concurrent response times are from 5-12ms. even under
>>
heavy load its below 20ms
>>
>> Infiniband latency is below 0.01 us
Which is
>> why we chose it.
>>
>> we run MSSQL on the 64K formatted
Luns, its massively
>> fast.
>>
>> If we copy the same data twice the
cache cade kicks in and we
>> achieve even better speeds.
>>
>> Ive
compared this to block io @ 8Gb/s Fiber
>> channel, and it barely gets
120,000 IOPS. @ much higher latency and
>> bandwidth.
>>
>> So for our
money RDMA wins!
>>
>> Bruce
>>
>> On 23.10.2012 21:20,
>> Jason Davis
wrote:
>> > Out of curiosity, is there any quick performance
>> numbers
for these ZFS +
>> > GlusterFS mashups you guys are talking
>> about?
>>
>
>> > Specifically, IOPs and latency? Sequential read/write
>>
performance honestly
>> > isn't a very good benchmark to determine
your
>> SANs performance. It's like
>> > comparing CPUs based solely on
how many
>> GHz it runs at. Sure you can get
>> > great MB or GB/s with
SATA disk but
>> I'd reckon that IOP performance is
>> > abismal. If you
are utilizing
>> GlusterFS without the cache pooling magic
>> > that is
ZFS then I would
>> imagine that latency can be an issue.
>> >
>> >
>>
>
>> > On Tue, Oct 23, 2012 at
>> 7:56 AM, Andreas Huser wrote:
>> >
>>
>> Hi Fabrice,
>> >>
>> >> i know
>> OpenSolaris/Solaris Oracle it's so
a thing.
>> >> I'm for more then 10
>> years a open source user and
that
>> >> with oracle - i did no like at the
>> beginning of this
constallation.
>> >> But Oracle makes his work good i
>> know that. The
cost of one socket
>> >> are 700$ and you can use so much
>> quantity of
TB as you will.
>> >> And you can use the full premier Support
>> from
Oracle.
>> >> Nexenta develop with the Illumos code. And the Licence
>>
are TB based.
>> >> That is not my favorite. As well the pool version
from
>> Nexenta comes
>> >> not after. Current Nexenta Infiniband are
not a usable
>> solution.
>> >> But every can use what he will. Everyone
must decide for
>> themselves.
>> >>
>> >> SRP Targets or iser are not
difficult to configure.
>> Use the SRP for
>> >> the Storage unit
connection. Solaris and GlusterFS
>> builds one Storage unit.
>> >> The
GlusterFS Server export the final Volume
>> to the Clients as well
KVM,
>> >> VMWare, Hyper-V etc.
>> >> You can use
>> native GlusterFS,
RDMA, NFS ore CIFS to export the Volume.
>> >> SRP have
>> nothing to do
with VmWare.
>> >>
>> >> When you are use a 7200 SAS drive the
>> access
time are the same as a
>> >> SATA drive only the quality of the
>>
hardware are better. When you need
>> >> Performance you must use SAS
>>
drives with 15000U/m. But it's not needed
>> >> when you install SSD
for
>> ZIL/L2ARC. ZeusRAM rocks :-)
>> >>
>> >> I use dedup only at
secondary stroage
>> or on Backupserver not on Primary
>> >> Storage.
>>
>> When you use SSD SATA
>> drives then you have an cheap an fast
strorage.
>> >> 1TB drive cost unter
>> 100$. Current i'm not need to
save storage volume.
>> >>
>> >> Which
>> application use atime? I know
"find -atime N". atime on a Storage
>> >>
>> that stores only Virtual
Disks?
>> >> I don't need the information when i
>> have last opening
the disk :-)
>> >>
>> >> For a Solaris GUI take a look at
>> napp-it
>>
>> http://www.napp-it.org/
>> >>
>> >> greetings from germany
>> >>
>>
Andreas
>> >>
>> >>
>> >>
>> >>
>> >> ----- Ursprüngliche Mail -----
>>
>>
>> >> Von:
>> "Fabrice Brazier">> An:
cloudstack-users@incubator.apache.org
>> >>
>> Gesendet: Dienstag, 23.
Oktober 2012 12:30:50
>> >> Betreff: RE: Primary
>> Storage
>> >>
>> >>
Hi Andreas,
>> >>
>> >> Hum that's pretty cool, i know they have
>>
still trouble with infiniband on
>> >> nexenta but it's clearly a
priority
>> on their roadmap (plus i trust more the
>> >> nexenta team
to evolve zfs
>> than oracle).
>> >>
>> >> I agree iscsi over infiniband
increase the latency
>> but most of the time
>> >> it's
>> >> just more
simple to use IPOIB than IB,
>> for example if you use vmware the
>> >>
support of infiniband it's assured
>> by mellanox and not vmware so if
you
>> >> have
>> >> an issue the vmware
>> support probably don't help
you anymore.
>> >>
>> >> About the raid i'm a fan
>> of raid 10, i
prefer build a raid 10 with 7200 sas
>> >> drive than
>> multiple vdev
raidz/raidz2 15k sas drive. Particularly for
>> >> virtual
>>
environment with a random workload and multiple access.
>> >>
>> >> I'm
a fan
>> of nfs so agreed about the zil, and zeus ram are the only one
for
>> >>
>> that (with classical ssd you can hit the write hole).
>>
>>
>> >> Agreed for
>> compression too (but only lzjb level gzip use too
much cpu),
>> >> disable
>> atime permit to decrease the iops load but
i'm not sure is really a
>> >>
>> best practice. About the deduplication
I don't totally agree. It
>> really
>> >> depends of your array, the
workload and the VM type on your
>> cloudstack.
>> >> Actually when i
build a zfs array i count 1Gb ram for 1Tb
>> disks. With
>> >>
deduplication i count 2Gb ram for 1 Tb disks (plus in
>> nexenta 4 they
will
>> >> add a possibility to use ssd for the
>> deduplication metada,
like the l2arc
>> >> for the read cache).
>> >>
>> >> The
>> last point
it's about your drives, why sata drives? I mean sata
>> doesn't
>> >>
respect their latency, and the reliability of a sas drive is
>> 10x the
sata.
>> >> Plus now you can found many NL-sas drive at low
>> cost
>>
>>
>> >> But it's really a nice architecture, i never try glusterfs
>>
(for the moment)
>> >> plus in that case it's really a good way to have
a
>> replacement to a
>> >> metro-cluster for free, i try one of is
competitor
>> (onefs) and clearly
>> >> clustered filesystem are the
futur.
>> >>
>> >>
>> Cheers,
>> >> Fabrice
>> >>
>> >> -----Message
d'origine-----
>> >> De : Andreas
>> Huser
[mailto:ahuser@7five-edv.de]
>> >> Envoyé : mardi 23 octobre 2012
>>
11:37
>> >> À : cloudstack-users@incubator.apache.org
>> >> Objet :
Re:
>> Primary Storage
>> >>
>> >> Hi Fabrice,
>> >>
>> >> I don't know
what other people
>> do but i have no problems with Infiniband +
>> >>
GlusterFS + Cloudstack
>> I'm not use Nexenta it's based on Illumos and
work
>> >> not fine with
>> Infiniband.
>> >> I have two different
clusters in productiv
>> enviroments.
>> >>
>> >> The first: Solaris 11
with built-in GlusterFS 3.3
>> export Gluster Vol. with
>> >> RDMA. >
performance is okey you can use that
>> for smaller enviroments The
>>
>> Second: Is a little bit complex with a
>> GlusterFS Server in the
middle
>> >>
>> >> ZFS Server: based on Solaris 11
>> >>
>> 1.) Create a
zpool min. two vdevs and SSD read/write cache
>> >> 2.) Create
>> a thin
provisioning Volume "zfs create -V" disable atime and
>> >> enable
>>
compression (do not enable dedup!) and export as (iWarp) SRP Target
>>
>>
>> to the GlusterFS Server use a direct connection without IB
Switch.
>> >>
>> >>
>> GlusterFS Server:
>> >> 1.) Use RedHat, CentOS or
Fedora (I use CentOS 5
>> and 6)
>> >> 2.) Use OFED driver from
https://www.openfabrics.org
>> >> 3.)
>> Import the SRP Target from ZFS
Server and format as xfs
>> >> 4.) Create a
>> Glustervolume "volume
create xy transport rdma " (use only
>> >> rdma)
>> >>
>> 5.) Connect
with the second IB Port to a IB Switch
>> >>
>> >> Cloudstack
>>
Hypervisor Node:
>> >> 1.) Use RedHat, CentOS or Fedora (I use CentOS 5
and
>> 6)
>> >> 2.) Use OFED driver from https://www.openfabrics.org
>>
>> 3.) Import
>> the Glustervolume
>> >>
>> >> ZFS Thin Volume ----
Infinband SRP ---->
>> GlusterFS ---- GFSVol rdma ---->
>> >> IB Switch
----> Clients
>> >>
>> >> The
>> ZFS and GlusterFS Server form a Storage
unit connect direct with
>> 40Gbit
>> >> Infiniband Point-to-Point You
do not feel that is a cable
>> between!
>> >>
>> >> Important: When you
have Infiniband use not IPoIB with
>> iSCSI! If one
>> >> already
>> >>
has Infiniband then you should also use the
>> advantage.
>> >> IPoIB
have a higher Latency as iWarp SRP!
>> >>
>> >>
>> >> SRP =
>> usec
>>
>>
>> >> -- SRP --
>> >> local address: LID 0x01 QPN 0x44004b PSN
>>
0xf3265b RKey 0x9804237c VAddr
>> >> 0x00000001dda000 remote address:
LID
>> 0x0a QPN 0x10004a PSN 0x44072e RKey
>> >> 0x1c0f115 VAddr
>>
0x000000088e6000
>> >>
>>
------------------------------------------------------------------
>>
>>
>> #bytes #iterations t_min[usec] t_max[usec] t_typical[usec]
>> >> 2
1000
>> 1.29 125.30 1.31
>> >>
>>
------------------------------------------------------------------
>>
>>
>> >>
>> -- IPoIB ---
>> >> [root@sv01sfogaa ~]# ping 10.200.0.10
>>
>> PING
>> 10.200.0.10 (10.200.0.10) 56(84) bytes of data.
>> >> 64
bytes from
>> 10.200.0.10: icmp_seq=1 ttl=255 time=0.147 ms
>> >> 64
bytes from
>> 10.200.0.10: icmp_seq=2 ttl=255 time=0.116 ms
>> >>
>> >>
When you put load on
>> IPoIB then increases the latency that is not
good
>> >>
>> >>
>> >> That is my
>> recommendation for a simple
GlusterFS Mirror:
>> >>
>> >> - Supermicro Server
>> with Intel Hardware
and Expander Backplane
>> >> - 1x Crucal-M4 SSD Read
>> Cache
>> >> - 2x
ZeusIOPs SSD Write cache (mirror)
>> >> - SATA 24/7
>> Harddrives
>> >>
- LSI HBA 9207 or 9211
>> >> - ConnectX-2 QDR Dualport
>> Infiniband
Adaper (HP refurbished with full
>> >> warrany for 100$)
>> Importent >
Flash newest Firmware from Mellanox!
>> >> - Mellanox IB
>> Swich
>> >>
- Solaris 11
>> >> - GlusterFS 3.3 compiled with ib_verbs
>> >> -
>>
Gluster Volume transport only rdma
>> >>
>> >>
>> >>
>> >>
>> >>
>>
>>
>>> Throughput
>> constant up to 200 Mbyte/s >> more throughput with
more storage servers
>> or more harddrives on jbod >> >> >>
>>
>> >>
>>
>>
>> >> Info:
>> >>
>> >> - I have some
>> problems with infiniband
RDMA or SRP with OpenIndiana or
>> >> Illumos or
>> Nexenta. Some
adapters have a high latency or not a stable
>> >>
>> connection. Use
Solaris that's the right way!
>> >> - OpenIndiana is Beta!
>> Infiniband
ib_verbs works not or not fine!
>> >> - Use Solaris 11
>> Infiniband
ib_verbs are native and stable
>> >> - Don't use Ubuntu Client
>> Server
for Infiniband! Use RedHat, Fedora or
>> >> CentOS and install the
>>
right drivers from
>> >> https://www.openfabrics.org/downloads/OFED/
>>
>> -
>> You have not SSD Cache? Disable at the ZFS Volume the sync!
>>
Important!
>> >> You
>> >> loose Security for your Data but some
protocols use
>> sync flags in
>> >> transport.
>> >> Example NFS use by
default fsync. write
>> cache are not active. NFS writes
>> >> data
direct to the Harddrive. For
>> Data Security and Performance give
the
>> >> Storage Server a SSD write
>> cache. ZFS works at default with
sync=standard
>> >> that prevent write
>> holes. (COW System)
>> >>
>>
>> I hope that I could help a little
>> >>
>> >>
>> Greeting from
Germany
>> >> Andreas
>> >>
>> >>
>> >>
>> >> ----- Ursprüngliche
Mail
>> -----
>> >>
>> >> Von: "Fabrice Brazier"
>> >> An:
>>
cloudstack-users@incubator.apache.org
>> >> Gesendet: Dienstag, 23.
Oktober
>> 2012 09:55:15
>> >> Betreff: RE: Primary Storage
>> >>
>> >>
Hi Andreas,
>> >>
>> >>
>> Hello i just see your configuration, it seems
quite interesting.
>> >> If i
>> understand well you want to build some
zfs array on the backend.
>> >>
>> Export luns (probably by iscsi over
infiniband) to you linux cluster,
>> and
>> >> on
>> >> the linux
cluster you put glusterFS.
>> >> I can understand
>> the point, with
that you can have very good performance
>> >> and
>> >>
>> reliability
(zfs), scalability and redundancy (gluster) for very low
>> cost.
>> >>
So just one question, did you try the global namespace
>> implementation
from
>> >> nexenta?
>> >> If yes can you tell me what
>> configuration
is the best for you?
>> >> I mean the fact you have a gluster
>> cluster
in the middle must impact the
>> >> overral performance no?
>> >>
>>
>>
>> Fabrice
>> >>
>> >> -----Message d'origine-----
>> >> De : Andreas
Huser
>> [mailto:ahuser@7five-edv.de] Envoyé : mardi 23 octobre
>> >>
2012 05:40 À :
>> cloudstack-users@incubator.apache.org
>> >> Objet :
Re: Primary
>> Storage
>> >>
>> >> Hi,
>> >>
>> >> for Cloudstack i use
Solaris 11 ZFS + GlusterFS
>> over Infiniband (RDMA).
>> >> That
>> >>
gives the best performance and most
>> scalable Storage.
>> >> I have
tasted some different solutions for primary
>> Storage but the most
are
>> >> to expensive and for a CloudStack Cluster
>> not economic or
have a poor
>> >> performance.
>> >>
>> >> My Configuration:
>> >>
>>
Storage Node:
>> >> Supermicro Server (Intel Hardware) with Solaris 11
with
>> SSD write and read
>> >> cache (read crucial-m4, write ZeusIOPS)
GlusterFS
>> and dualport ConnectX
>> >> 40Gbit/s Infiniband adapter.
>>
>>
>> >> I have
>> installed GlusterFS direct on Solaris with a modified
code.
>> >> Want you
>> build bigger systems for more then 50 VMs it is
better you split
>> >> the
>> Solaris and GlusterFS with a separte
headnode for GlusterFS
>> >>
>> >> That
>> looks like:
>> >> Solaris ZFS
Backendstorage with a dataset Volume (Thin
>> Provision) --> ( SRP
>> >>
Target attached direct without Infiniband switch
>> to GF Node) -->
GlusterFS
>> >> Node the srp target formatted with xfs
>> filesystem,
create a GlusterFS
>> >> Volume --> ( Infiniband over a
>> Mellanox Port
Switch) --> Cloudstack Node
>> >> mount glusterFS Volume over
>> RDMA
>>
>>
>> >> For the Dataset Volume at the ZFS Storage, disable atime and
>>
enable
>> >> compression.
>> >> (Space reclaim) With compression you
can
>> shrink the ZFS Volume with command
>> >> at Linux dd /dev/zero or
In a
>> Windows VM with sdelete That gives you space
>> >> left on the
Primary
>> Storage for deleted Files in a VM or for deleted vhd's
>> >>
or vm's in the
>> cloudstack
>> >>
>> >> greeting Andreas
>> >>
>> >>
>>
>>
>> >>
>> >> Mit freundlichen
>> Grüßen
>> >>
>> >> Andreas Huser
>>
>> Geschäftsführer
>> >> System Engineer /
>> Consultant
>> >> (Cisco
CSE, SMBAM, LCSE, ASAM)
>> >>
>>
---------------------------------------
>> >> Zellerstraße 28 - 77654
>>
Offenburg
>> >> Tel: +49(781) 12786898
>> >> Mobil: +49(176) 10308549
>>
>>
>> ahuser@7five-edv.de
>> >>
>> >>
>> >>
>> >>
>> >> -----
Ursprüngliche Mail -----
>> >>
>> >>
>> Von: "Outback Dingo"
>> >> An:
cloudstack-users@incubator.apache.org
>> >>
>> Gesendet: Dienstag, 23.
Oktober 2012 02:15:16
>> >> Betreff: Re: Primary
>> Storage
>> >>
>> >>
On Mon, Oct 22, 2012 at 8:09 PM, Ivan
>> Rodriguezwrote:
>> >>
>>
>>
>>
>> >>> Solaris 11 ZFS and yes we tried different
>> setups, raids
levels number >> of SSD cache, ARC zfs options etc etc
>> etc. >> >>
Cheers >> >> >> VMWare ?? >>
>>
>> >>
>> >>
>> >>
>> >>
>>

 
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message