hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rakesh Radhakrishnan <rake...@apache.org>
Subject Re: About Archival Storage
Date Wed, 20 Jul 2016 04:35:20 GMT
>>>I have another question is , hdfs mover (A New Data Migration Tool )
know when to move data from hot to cold  automatically ?
While running the tool, it reads the argument and get the separated list of
hdfs files/dirs to migrate. Then it periodically scans these files in HDFS
to check if the block placement satisfies the storage policy, if not
satisfied it moves the replicas to a different storage type in order to
fulfill the storage policy requirement. This cycle continues until it hits
an error or no blocks to move etc. Could you please tell me, what do you
meant by "automatically" ? FYI, HDFS-10285 is proposing an idea to
introduce a daemon thread in Namenode to track the storage movements set by
APIs from clients. This Daemon thread named as StoragePolicySatisfier(SPS)
serves something similar to ReplicationMonitor. If interested you can read
the https://goo.gl/NA5EY0 proposal/idea and welcome feedback.

Sleep time between each cycle is, ('dfs.heartbeat.interval' * 2000) +
('dfs.namenode.replication.interval' * 1000) milliseconds;

>>>It use algorithm like LRU、LFU ?
It will simply iterating over the lists in the order of files/dirs given to
this tool as an argument. afaik, its just maintains the order mentioned by
the user.

Regards,
Rakesh


On Wed, Jul 20, 2016 at 7:05 AM, kevin <kiss.kevin119@gmail.com> wrote:

> Thanks a lot Rakesh.
>
> I have another question is , hdfs mover (A New Data Migration Tool ) know
> when to move data from hot to cold  automatically ? It use algorithm
> like LRU、LFU ?
>
> 2016-07-19 19:55 GMT+08:00 Rakesh Radhakrishnan <rakeshr@apache.org>:
>
>> >>>>Is that mean I should config dfs.replication with 1 ?  if more than
>> one I should not use *Lazy_Persist*  policies ?
>>
>> The idea of Lazy_Persist policy is, while writing blocks, one replica
>> will be placed in memory first and then it is lazily persisted into DISK.
>> It doesn't means that, you are not allowed to configure dfs.replication >
>> 1. If 'dfs.replication' is configured > 1 then the first replica will be
>> placed in RAM_DISK and all the other replicas (n-1) will be written to the
>> DISK. Here the (n-1) replicas will have the overhead of pipeline
>> replication over the network and the DISK write latency on the write hot
>> path. So you will not get better performance results.
>>
>> IIUC, for getting memory latency benefits, it is recommended to use
>> replication=1. In this way, applications should be able to perform single
>> replica writes to a local DN with low latency. HDFS will store block data
>> in memory and lazily save it to disk avoiding incurring disk write latency
>> on the hot path. By writing to local memory we can also avoid checksum
>> computation on the hot path.
>>
>> Regards,
>> Rakesh
>>
>> On Tue, Jul 19, 2016 at 3:25 PM, kevin <kiss.kevin119@gmail.com> wrote:
>>
>>> I don't quite understand :"Note that the Lazy_Persist policy is useful
>>> only for single replica blocks. For blocks with more than one replicas, all
>>> the replicas will be written to DISK since writing only one of the replicas
>>> to RAM_DISK does not improve the overall performance."
>>>
>>> Is that mean I should config dfs.replication with 1 ?  if more than one
>>> I should not use *Lazy_Persist*  policies ?
>>>
>>
>>
>

Mime
View raw message