hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kevin <kiss.kevin...@gmail.com>
Subject Re: About Archival Storage
Date Wed, 20 Jul 2016 05:00:16 GMT
Thanks again. "automatically" what I mean is the hdfs mover knows the hot
data have come to cold , I don't need to tell it what exactly files/dirs
need to be move now ?
Of course I should tell it what files/dirs need to monitoring.

2016-07-20 12:35 GMT+08:00 Rakesh Radhakrishnan <rakeshr@apache.org>:

> >>>I have another question is , hdfs mover (A New Data Migration Tool )
> know when to move data from hot to cold  automatically ?
> While running the tool, it reads the argument and get the separated list
> of hdfs files/dirs to migrate. Then it periodically scans these files in
> HDFS to check if the block placement satisfies the storage policy, if not
> satisfied it moves the replicas to a different storage type in order to
> fulfill the storage policy requirement. This cycle continues until it hits
> an error or no blocks to move etc. Could you please tell me, what do you
> meant by "automatically" ? FYI, HDFS-10285 is proposing an idea to
> introduce a daemon thread in Namenode to track the storage movements set by
> APIs from clients. This Daemon thread named as StoragePolicySatisfier(SPS)
> serves something similar to ReplicationMonitor. If interested you can read
> the https://goo.gl/NA5EY0 proposal/idea and welcome feedback.
>
> Sleep time between each cycle is, ('dfs.heartbeat.interval' * 2000) +
> ('dfs.namenode.replication.interval' * 1000) milliseconds;
>
> >>>It use algorithm like LRU、LFU ?
> It will simply iterating over the lists in the order of files/dirs given
> to this tool as an argument. afaik, its just maintains the order mentioned
> by the user.
>
> Regards,
> Rakesh
>
>
> On Wed, Jul 20, 2016 at 7:05 AM, kevin <kiss.kevin119@gmail.com> wrote:
>
>> Thanks a lot Rakesh.
>>
>> I have another question is , hdfs mover (A New Data Migration Tool )
>> know when to move data from hot to cold  automatically ? It
>> use algorithm like LRU、LFU ?
>>
>> 2016-07-19 19:55 GMT+08:00 Rakesh Radhakrishnan <rakeshr@apache.org>:
>>
>>> >>>>Is that mean I should config dfs.replication with 1 ?  if more
than
>>> one I should not use *Lazy_Persist*  policies ?
>>>
>>> The idea of Lazy_Persist policy is, while writing blocks, one replica
>>> will be placed in memory first and then it is lazily persisted into DISK.
>>> It doesn't means that, you are not allowed to configure dfs.replication >
>>> 1. If 'dfs.replication' is configured > 1 then the first replica will be
>>> placed in RAM_DISK and all the other replicas (n-1) will be written to the
>>> DISK. Here the (n-1) replicas will have the overhead of pipeline
>>> replication over the network and the DISK write latency on the write hot
>>> path. So you will not get better performance results.
>>>
>>> IIUC, for getting memory latency benefits, it is recommended to use
>>> replication=1. In this way, applications should be able to perform single
>>> replica writes to a local DN with low latency. HDFS will store block data
>>> in memory and lazily save it to disk avoiding incurring disk write latency
>>> on the hot path. By writing to local memory we can also avoid checksum
>>> computation on the hot path.
>>>
>>> Regards,
>>> Rakesh
>>>
>>> On Tue, Jul 19, 2016 at 3:25 PM, kevin <kiss.kevin119@gmail.com> wrote:
>>>
>>>> I don't quite understand :"Note that the Lazy_Persist policy is useful
>>>> only for single replica blocks. For blocks with more than one replicas, all
>>>> the replicas will be written to DISK since writing only one of the replicas
>>>> to RAM_DISK does not improve the overall performance."
>>>>
>>>> Is that mean I should config dfs.replication with 1 ?  if more than one
>>>> I should not use *Lazy_Persist*  policies ?
>>>>
>>>
>>>
>>
>

Mime
View raw message