cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Kudryavtsev <kudryavtsev...@bw-sw.com>
Subject Re: AW: AW: AW: KVM storage cluster
Date Fri, 02 Feb 2018 16:21:07 GMT
Andrija, indeed, amen!

2 февр. 2018 г. 11:14 ПП пользователь "Andrija Panic" <
andrija.panic@gmail.com> написал:

> No other FIO command, that is OK, direct=1, engine=libaio is the critical
> ones, I use very similar setup, except that i prefer to do pure READ and
> later pure WRITE, dont like these interleaved settings :)
> Also, the critical thing is not the IOPS alone but also LATENCY (completion
> latency on IO) - make sure to check those.
>
> Those 250.000 were combined, so my bad I did not read it correctly, but it
> makes it possible for sure to reach that, if you write to 6x35K IOPS that
> is still more (theoretically) then what you get - so you get under the spec
> (137K vs almost 200K for writes), which sounds realistic and OK I guess.
>
> And yes, the volume size should be more than RAM in cases when RAM is used
> for any kind of buffering/caching, but again I have no idea how this works
> with scaleIO - with direct=1 you avoid writing to VM's/HOST's RAM, just
> write directly to storage over network, that is OK.
>
> If you do any other scaleIO benchmarks or have other results later, I'm
> very interested to see it, since I never played with ScaleIO :)
>
> Here is one of the articles (if you can trust it...) showing some CEPH vs
> ScaleIO differences
> http://cloudscaling.com/blog/cloud-computing/killing-the-
> storage-unicorn-purpose-built-scaleio-spanks-multi-purpose-
> ceph-on-performance/
>
>
> Not meant to start a war on better one :), but CEPH definitively sucks on
> random IO, and even if you have 1000 x 100% sequential streams/writes to
> storage, those 1000 streams become all interleaved at the end, becoming
> effectively pure RANDOM IO on storage side.
> We have been fighting a long battle with CEPH, and it's just not worth it,
> for good performance VMs, simply not.
> It is though exceptionally nice storage for other streaming application or
> massive scalling... again, just my 2 cents after 3 years in production
>
> Whatever storage you choose, make sure you are not going to regret on many
> different factors - performances?, ACS integration good enough?, Libvirt
> driver stable enough (if used, i.e. for CEPH librbd) ? vendor support ?
> etc), since this is the core of your cloud.
> Believe me on this :)
>
>
> On 2 February 2018 at 16:22, Ivan Kudryavtsev <kudryavtsev_ia@bw-sw.com>
> wrote:
>
> > I suppose Andrija says about the volume size, it should be much bigger
> than
> > storage host RAM.
> >
> > 2 февр. 2018 г. 10:17 ПП пользователь "S. Brüseke - proIO GmbH"
<
> > s.brueseke@proio.com> написал:
> >
> > > Hi Andrija,
> > >
> > > you are right, of course it is Samsung PM1633a. I am not sure if this
> is
> > > really only RAM. I let the fio command run for more than 30min and IOPS
> > did
> > > not drop.
> > > I am using 6 SSDs in my setup, each has 35.000 IOPS random write max,
> so
> > > ScaleIO can do 210.000 IOPS (read) at its best. fio shows around
> 140.000
> > > IOPS (read) max. ScaleIO GUI shows me around 45.000 IOPS (read/write
> > > combined) per SSD.
> > >
> > > Do you have a different fio command I can run?
> > >
> > > Mit freundlichen Grüßen / With kind regards,
> > >
> > > Swen
> > >
> > > -----Ursprüngliche Nachricht-----
> > > Von: Andrija Panic [mailto:andrija.panic@gmail.com]
> > > Gesendet: Freitag, 2. Februar 2018 16:04
> > > An: users <users@cloudstack.apache.org>
> > > Cc: S. Brüseke - proIO GmbH <s.brueseke@proio.com>
> > > Betreff: Re: AW: AW: KVM storage cluster
> > >
> > > From my extremely short reading on ScaleIO few months ago, they are
> > > utilizing RAM or similar for write caching, so basically, you write to
> > RAM
> > > or other part of ultra fast temp memory (NVME,etc) and later it is
> > flushed
> > > to durable part of storage.
> > >
> > > I assume its 1633a not 1663a ? -
> > > http://www.samsung.com/semiconductor/ssd/enterprise-ssd/MZILS1T9HEJH/
> (
> > > ?) This one can barely do 35K IOPS of write per spec... and based on my
> > > humble experience with Samsung, you can hardly ever reach that
> > > specification, even with locally attached SSD and a lot of CPU
> > > available...(local filesystem)
> > >
> > > So it must be RAM writing for sure...so make sure you saturate the
> > > benchmark enough, so that the flushing process kicks in, and that the
> > > benchmark will make sense when you later have constant IO load on the
> > > cluster.
> > >
> > > Cheers
> > >
> > >
> > > On 2 February 2018 at 15:56, Ivan Kudryavtsev <
> kudryavtsev_ia@bw-sw.com>
> > > wrote:
> > >
> > > > Swen, performance looks awesome, but still wonder where is the magic
> > > > here, because AFAIK Ceph is not capable to even touch the base, but
> > > > Red Hat bets on it... Might it be the ScaleIO doesn't wait while the
> > > > replication complete for IO or other hack is used?
> > > >
> > > > 2 февр. 2018 г. 3:19 ПП пользователь "S. Brüseke -
proIO GmbH" <
> > > > s.brueseke@proio.com> написал:
> > > >
> > > > > Hi Ivan,
> > > > >
> > > > >
> > > > >
> > > > > it is a 50/50 read-write mix. Here is the fio command I used:
> > > > >
> > > > > fio --name=test --readwrite=randrw --rwmixwrite=50 --bs=4k
> > > > > --invalidate=1 --group_reporting --direct=1 --filename=/dev/scinia
> > > > > --time_based
> > > > > --runtime=9999 --ioengine=libaio --numjobs=4 --iodepth=256
> > > > > --norandommap
> > > > > --randrepeat=0 –exitall
> > > > >
> > > > >
> > > > >
> > > > > Result was:
> > > > >
> > > > > IO Workload 274.000 IOPS
> > > > >
> > > > > 1,0 GB/s transfer
> > > > >
> > > > > Read Bandwith 536MB/s
> > > > >
> > > > > Read IOPS 137.000
> > > > >
> > > > > Write Bandwith 536MB/s
> > > > >
> > > > > Write IOPS 137.000
> > > > >
> > > > >
> > > > >
> > > > > If you want me to run a different fio command just send it. My lab
> > > > > is still running.
> > > > >
> > > > >
> > > > >
> > > > > Any idea how I can mount my ScaleIO volume in KVM?
> > > > >
> > > > >
> > > > >
> > > > > Mit freundlichen Grüßen / With kind regards,
> > > > >
> > > > >
> > > > >
> > > > > Swen
> > > > >
> > > > >
> > > > >
> > > > > *Von:* Ivan Kudryavtsev [mailto:kudryavtsev_ia@bw-sw.com]
> > > > > *Gesendet:* Freitag, 2. Februar 2018 02:58
> > > > > *An:* users@cloudstack.apache.org; S. Brüseke - proIO GmbH <
> > > > > s.brueseke@proio.com>
> > > > > *Betreff:* Re: AW: KVM storage cluster
> > > > >
> > > > >
> > > > >
> > > > > Hi, Swen. Do you test with direct or cached ops or buffered ones?
> Is
> > > > > it a write test or rw with certain rw percenrage? Hardly believe
> the
> > > > deployment
> > > > > can do 250k IOs for writting with single VM test.
> > > > >
> > > > >
> > > > >
> > > > > 2 февр. 2018 г. 4:56 пользователь "S. Brüseke -
proIO GmbH" <
> > > > > s.brueseke@proio.com> написал:
> > > > >
> > > > > I am also testing with ScaleIO on CentOS7 with KVM. With a 3 node
> > > > > cluster with each node has 2x 2TB SSD (Samsung PM1663a) I get
> > > > > 250.000 IOPS when doing a fio test (random 4k).
> > > > > The only problem is that I do not know how to mount the shared
> > > > > volume so that KVM can use it to store vms on it. Does anyone know
> > how
> > > to do it?
> > > > >
> > > > > Mit freundlichen Grüßen / With kind regards,
> > > > >
> > > > > Swen
> > > > >
> > > > > -----Ursprüngliche Nachricht-----
> > > > > Von: Andrija Panic [mailto:andrija.panic@gmail.com]
> > > > > Gesendet: Donnerstag, 1. Februar 2018 22:00
> > > > > An: users <users@cloudstack.apache.org>
> > > > > Betreff: Re: KVM storage cluster
> > > > >
> > > > >
> > > > > a bit late, but:
> > > > >
> > > > > - for any IO heavy (medium even...) workload, try to avoid CEPH,
no
> > > > > offence, simply it takes lot of $$$ to make CEPH perform in random
> > > > > IO worlds (imagine RHEL and vendors provide only refernce
> > > > > architecutre with SEQUNATIAL benchmark workload, not random) - not
> > > > > to mention a huge list
> > > > of
> > > > > bugs we hit back in the days (simply, one/single great guy handled
> > > > > the
> > > > CEPH
> > > > > integration for CloudStack, but otherwise not lot of help from
> other
> > > > > committers, if not mistaken, afaik...)
> > > > > - NFS better performance but not magic... (but most well supported,
> > > > > code wise, bug-less wise :)
> > > > > - and for top notch (cost some $$$) SolidFire is the way to go (we
> > > > > have tons of IO heavy customers, so this THE solution really, after
> > > > > living
> > > > with
> > > > > CEPH, then NFS on SSDs, etc) and provides guarantied IOPS etc...
> > > > >
> > > > > Cheers.
> > > > >
> > > > > On 7 January 2018 at 22:46, Grégoire Lamodière
> > > > > <g.lamodiere@dimsi.fr>
> > > > > wrote:
> > > > >
> > > > > > Hi Vahric,
> > > > > >
> > > > > > Thank you. I will have a look on it.
> > > > > >
> > > > > > Grégoire
> > > > > >
> > > > > >
> > > > > >
> > > > > > Envoyé depuis mon smartphone Samsung Galaxy.
> > > > > >
> > > > > >
> > > > > > -------- Message d'origine -------- De : Vahric MUHTARYAN
> > > > > > <vahric@doruk.net.tr> Date : 07/01/2018 21:08
> > > > > > (GMT+01:00) À : users@cloudstack.apache.org Objet : Re: KVM
> > > > > > storage cluster
> > > > > >
> > > > > > Hello Grégoire,
> > > > > >
> > > > > > I suggest you to look EMC scaleio for block based operations.
It
> > > > > > has a free one too ! And as a block working better then Ceph
;)
> > > > > >
> > > > > > Regards
> > > > > > VM
> > > > > >
> > > > > > On 7.01.2018 18:12, "Grégoire Lamodière" <g.lamodiere@dimsi.fr>
> > > wrote:
> > > > > >
> > > > > >     Hi Ivan,
> > > > > >
> > > > > >     Thank you for your quick reply.
> > > > > >
> > > > > >     I'll have a look on Ceph and related perfs.
> > > > > >     As you mentionned, 2 DRDB nfs servers can do the job, but
if
> I
> > > > > > can avoid using 2 blades for just passing blocks to nfs, this
is
> > > > > > even better (and maintain them as well).
> > > > > >
> > > > > >     Thanks for pointing to ceph.
> > > > > >
> > > > > >     Grégoire
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >     ---
> > > > > >     Grégoire Lamodière
> > > > > >     T/ + 33 6 76 27 03 31
> > > > > >     F/ + 33 1 75 43 89 71
> > > > > >
> > > > > >     -----Message d'origine-----
> > > > > >     De : Ivan Kudryavtsev [mailto:kudryavtsev_ia@bw-sw.com]
> > > > > >     Envoyé : dimanche 7 janvier 2018 15:20
> > > > > >     À : users@cloudstack.apache.org
> > > > > >     Objet : Re: KVM storage cluster
> > > > > >
> > > > > >     Hi, Grégoire,
> > > > > >     You could have
> > > > > >     - local storage if you like, so every compute node could
have
> > > > > > own space (one lun per host)
> > > > > >     - to have Ceph deployed on the same compute nodes (distribute
> > > > > > raw devices among nodes)
> > > > > >     - to dedicate certain node as NFS server (or two servers
with
> > > > > > DRBD)
> > > > > >
> > > > > >     I don't think that shared FS is a good option, even clustered
> > > > > > LVM is a big pain.
> > > > > >
> > > > > >     2018-01-07 21:08 GMT+07:00 Grégoire Lamodière <
> > > > g.lamodiere@dimsi.fr
> > > > > >:
> > > > > >
> > > > > >     > Dear all,
> > > > > >     >
> > > > > >     > Since Citrix changed deeply the free version of XenServer
> > > > > > 7.3, I am in
> > > > > >     > the process of Pocing moving our Xen clusters to KVM
on
> > > > > > Centos 7
> > > > I
> > > > > >     > decided to use HP blades connected to HP P2000 over
> mutipath
> > > > > > SAS links.
> > > > > >     >
> > > > > >     > The network part seems fine to me, not so far from
what we
> > > > > > used
> > > > to
> > > > > do
> > > > > >     > with Xen.
> > > > > >     > About the storage, I am a little but confused about
the
> > shared
> > > > > >     > mountpoint storage option offerds by CS.
> > > > > >     >
> > > > > >     > What would be the good option, in terms of CS, to create
a
> > > > cluster
> > > > > fs
> > > > > >     > using my SAS array ?
> > > > > >     > I read somewhere (a Dag SlideShare I think) that GFS2
is
> the
> > > only
> > > > > >     > clustered FS supported by CS. Is it still correct ?
> > > > > >     > Does it mean I have to create the GFS2 cluster, make
> > > > > > identical
> > > > > mount
> > > > > >     > conf on all host, and use it on CS as NFS ?
> > > > > >     > I do not have to add the storage to KVM prior CS zone
> > creation
> > > ?
> > > > > >     >
> > > > > >     > Thanks a lot for any help / information.
> > > > > >     >
> > > > > >     > ---
> > > > > >     > Grégoire Lamodière
> > > > > >     > T/ + 33 6 76 27 03 31
> > > > > >     > F/ + 33 1 75 43 89 71
> > > > > >     >
> > > > > >     >
> > > > > >
> > > > > >
> > > > > >     --
> > > > > >     With best regards, Ivan Kudryavtsev
> > > > > >     Bitworks Software, Ltd.
> > > > > >     Cell: +7-923-414-1515
> > > > > >     WWW: http://bitworks.software/ <http://bw-sw.com/>
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Andrija Panić
> > > > >
> > > > > - proIO GmbH -
> > > > > Geschäftsführer: Swen Brüseke
> > > > > Sitz der Gesellschaft: Frankfurt am Main
> > > > >
> > > > > USt-IdNr. DE 267 075 918
> > > > > Registergericht: Frankfurt am Main - HRB 86239
> > > > >
> > > > > Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
> > > > > Informationen.
> > > > > Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
> > > > > irrtümlich erhalten haben, informieren Sie bitte sofort den
> Absender
> > > > > und vernichten Sie diese Mail.
> > > > > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail
> > > > > sind nicht gestattet.
> > > > >
> > > > > This e-mail may contain confidential and/or privileged information.
> > > > > If you are not the intended recipient (or have received this e-mail
> > > > > in
> > > > > error) please notify
> > > > > the sender immediately and destroy this e-mail.
> > > > > Any unauthorized copying, disclosure or distribution of the
> material
> > > > > in this e-mail is strictly forbidden.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > ------------------------------
> > > > > - proIO GmbH -
> > > > > Geschäftsführer: Swen Brüseke
> > > > > Sitz der Gesellschaft: Frankfurt am Main
> > > > >
> > > > > USt-IdNr. DE 267 075 918
> > > > > Registergericht: Frankfurt am Main - HRB 86239
> > > > >
> > > > > Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
> > > > > Informationen.
> > > > > Wenn Sie nicht der richtige Adressat sind oder diese E- Mail
> > > > > irrtümlich erhalten haben, informieren Sie bitte sofort den
> Absender
> > > > > und vernichten Sie diese Mail.
> > > > > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail
> > > > > sind nicht gestattet.
> > > > >
> > > > > This e-mail may contain confidential and/or privileged information.
> > > > > If you are not the intended recipient (or have received this e-mail
> > > > > in error) please notify the sender immediately and destroy this
> > > > > e-mail.
> > > > > Any unauthorized copying, disclosure or distribution of the
> material
> > > > > in this e-mail is strictly forbidden.
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Andrija Panić
> > >
> > >
> > > - proIO GmbH -
> > > Geschäftsführer: Swen Brüseke
> > > Sitz der Gesellschaft: Frankfurt am Main
> > >
> > > USt-IdNr. DE 267 075 918
> > > Registergericht: Frankfurt am Main - HRB 86239
> > >
> > > Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
> > > Informationen.
> > > Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich
> > > erhalten haben,
> > > informieren Sie bitte sofort den Absender und vernichten Sie diese
> Mail.
> > > Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail sind
> > > nicht gestattet.
> > >
> > > This e-mail may contain confidential and/or privileged information.
> > > If you are not the intended recipient (or have received this e-mail in
> > > error) please notify
> > > the sender immediately and destroy this e-mail.
> > > Any unauthorized copying, disclosure or distribution of the material in
> > > this e-mail is strictly forbidden.
> > >
> > >
> > >
> >
>
>
>
> --
>
> Andrija Panić
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message