incubator-tashi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Stroucken <...@cmu.edu>
Subject Re: suspending machines in Tashi
Date Fri, 23 Mar 2012 18:44:44 GMT
Greg Ganger wrote:
>
> Yea, that would be the concern... perhaps it should be a config setting
> that one can set?  [Where the DFS is fast, go there directly... where
> not, stage through the local disk.]
I think the original issue that Richard thinks may have prompted this 
may have gone away.

So if data has to end up on DFS anyway, sending it directly eliminates 
the double copy. Right now (on SVN trunk), the data is stored to a local 
filesystem completely before sending to DFS.

In this case, the bottleneck is the local disk, so the double copy 
halves the throughput again. Lets say the state file is 100 GB big, and 
the disk has a throughput of 40 MB/s. Currently, storing it would 
theoretically take 1.5 hours.

If we're saving directly to DFS, which has a throughput of about 70 
MB/s, it should theoretically be finished in 24 minutes.

I am playing around with user-configurable suspend and resume handlers 
which could be free to stage things as they wish. In my case, they try 
to compress VM state in different ways.

The state is very variable in nature; at times the hypervisor sends more 
data than can be processed by the CPU, at other times the storage system 
is the bottleneck.

For a VM with the above characteristics and uninitialized local storage, 
actual suspend times ranged from 130 minutes using the default gzip and 
double copying down to 45 minutes using parallel gzip and storing 
directly onto DFS.

Greetings,
Michael.


Mime
View raw message