hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: SSD vs Spinning disks
Date Tue, 11 Jun 2013 15:19:47 GMT
On Mon, Jun 10, 2013 at 5:44 PM, Lucas Stanley <lucas23145@gmail.com> wrote:

> If I understand HBase's architecture correctly, it is only the WAL that
> needs to be placed on a SSD to make writes perform better?

I'm skeptical when looking at the whole picture. Depending which version of
HDFS you are using, and its configuration, writes to the WAL can be acked
by three datanodes (including one off rack, presumably under separate
power) after being received into memory without waiting for fsync. This
operates in the network and memory latency regimes already, not that of
spinning media, so the benefit SSDs could provide here is maybe less than
one might think. For many use cases this persistence strategy is good
enough, but for the paranoid, to as much as possible avoid *any* data loss
upon total datacenter power failure, then it's necessary to configure the
datanodes not to ack until after fsync completes on the blocks in progress.
In that case I presume using SSDs will reduce the average latencies
involved, but SSDs can also have periods of terrible write latency caused
by garbage collection at the FTL layer and other reasons, with worst cases
I have heard upwards of 40 seconds. That's significantly worse than worst
cases for spinning media. Also, SSDs are susceptible to data corruption
upon sudden power loss. I've heard of solid state devices surprisingly
totally and partially (as in a third of the device) bricked by sudden power
loss. If you think of FTLs as embedded custom filesystems of varying
maturity, maybe this shouldn't be so surprising. So even if fsync completes
on the SSD before power failure, you may still lose everything on it.
That's also a worst case worse than typical for spinning media. (How
frequent? Don't know. But I'm a pessimist by training.)

Taking a step back, you can turn off writes to the WAL selectively to make
an informed trade off between performance and data loss risk on a per
application / per write basis, and administratively flush memstores for
persistence dynamically independent of the WAL. There are knobs available
for increasing write performance, depending on your tolerance for risk, in
the absence today of support for tiered storage in HBase/HDFS.

On the other hand, random read workloads should benefit from having the
backing HFiles of hot read-mostly data placed into SSD storage. SSDs are
best for read heavy workloads in my opinion, there's long periods of time
without writes to achieve stable state, and they will live longer the less
writes they are subjected to. Random reads of working sets that exceed the
capacity of the blockcache are clearly impacted by the physical limits of
rotational media. Moving HBase storage from disk to SSDs able to sustain
orders of magnitude more read IOPS should produce a benefit, with the
greater the difference, for a given workload, between the IOPS disks can
drive versus SSDs, the more the potential benefit. We are doing R&D in this
area over at Intel and plan to publish experimental results in a few

Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message