hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rakesh Radhakrishnan <rake...@apache.org>
Subject Re: About Archival Storage
Date Tue, 19 Jul 2016 11:55:16 GMT
>>>>Is that mean I should config dfs.replication with 1 ?  if more than one
I should not use *Lazy_Persist*  policies ?

The idea of Lazy_Persist policy is, while writing blocks, one replica will
be placed in memory first and then it is lazily persisted into DISK. It
doesn't means that, you are not allowed to configure dfs.replication > 1.
If 'dfs.replication' is configured > 1 then the first replica will be
placed in RAM_DISK and all the other replicas (n-1) will be written to the
DISK. Here the (n-1) replicas will have the overhead of pipeline
replication over the network and the DISK write latency on the write hot
path. So you will not get better performance results.

IIUC, for getting memory latency benefits, it is recommended to use
replication=1. In this way, applications should be able to perform single
replica writes to a local DN with low latency. HDFS will store block data
in memory and lazily save it to disk avoiding incurring disk write latency
on the hot path. By writing to local memory we can also avoid checksum
computation on the hot path.

Regards,
Rakesh

On Tue, Jul 19, 2016 at 3:25 PM, kevin <kiss.kevin119@gmail.com> wrote:

> I don't quite understand :"Note that the Lazy_Persist policy is useful
> only for single replica blocks. For blocks with more than one replicas, all
> the replicas will be written to DISK since writing only one of the replicas
> to RAM_DISK does not improve the overall performance."
>
> Is that mean I should config dfs.replication with 1 ?  if more than one I
> should not use *Lazy_Persist*  policies ?
>

Mime
View raw message