crail-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonas Pfefferle" <>
Subject Re: Clarifying questions...
Date Tue, 09 Jul 2019 08:04:42 GMT
Hi David,

Good to hear things work now.
1) Technically, you can use the RdmaStorageTier "directly" with a SSD since 
it allocates its data in "datapath" (and then mmaps it). Now this path is 
typically a hugetlbfs but it can be a standard mount point. However, there 
are a few drawbacks with this approach: all IO is buffered and you have no 
control over when it is written to the SSD and since Rdma requires that all 
memory is pinned you have to allocate as much memory as your SSD has. So 
overall that is not really feasible.
My recommendation is to use the NVMf storage tier locally.
2) Correct, at the moment that is the only way you can do this: start 
multiple instances of SPDK or use SPDK RAID0 if you just want to use 
multiple devices in the same storage class.

FYI the shuffle plugin also supports configuring the storage class it should 
write to: "spark.crail.shuffle.storageclass" (put into Spark config)


  On Tue, 9 Jul 2019 01:05:37 +0000
  David Crespi <> wrote:
> HI,
> Wanted to ask if there is a way of using local ssd via the 
>RdmaStorageTier, so a couple of question.
>From the blog example there were these three classes.
> crail@clustermaster:~$ cat $CRAIL_HOME/conf/slaves
> clusternode1 -t -c 0
> clusternode1 -t -c 1
> disaggnode -t -c 2
>  1.  Is there a way of using the RdmaStorageTier directly with a SSD 
>that is local to the server “clusternode1”?
> Or is it that the local SSD has to be included into a NVMf subsystem 
>on that local server, thus the NvmfStorageTier
> is used on that same server in order to access the SSD locally via 
>an nvmf subsystem.
>  1.  I asked the question a few days ago about how to use the same 
>Subsystem NQN, which I can’t with a single
> instance of SPDK. Is this how using the same a NQN is possible, that 
>different instances of SPDK would be used… one on each server (i.e. 
>clusternode1 & clusternode2), each with their own “version” of that 
>same Subsystem?
> BTW…
> I have my environment all running now, and all in containers. 
> Everything appears to be working as advertised.
> The spark shuffle seems to be filling up the memory tier, then 
>continuing on to the ssd tier.  Haven’t done anything
> over 300G yet, but it’s coming.  I’m clarifying the above to be sure 
>I’m not missing out on one of the configs.  I’m
> currently also using HDFS for the tmp results as I currently only 
>have one instance of SPDK, so both
> NVMf class 1 and 2 can’t exist for me (assuming the answers above 
>that is 😊).
> Regards,
>           David

View raw message