hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Demai Ni <nid...@gmail.com>
Subject Re: hadoop/hdfs cache question, do client processes share cache?
Date Tue, 11 Aug 2015 20:35:52 GMT
Ritesh,

many thanks for your response. I just read through the centralized Cache
document. Thanks for the pointer. A couple follow-up questions.

First, the centralized cache required 'explicit' configuration, so by
default, there is no HDFS-managed cache? Will the cache occur at local
filesystem level like Linux?

The 2nd question. The centralized Cache is among the DN of HDFS. Let's say
the client is a stand-alone Linux(not part of the cluster), which connects
to the HDFS cluster with centralized cache configured. So on HDFS cluster,
the file is cached. In the same scenario, the client has 10 processes
repeatedly read the same HDFS file. will HDFS client API be able to cache
the file content at Client side? or every READ will have to move the whole
file through network, and no sharing  between processes?

Demai


On Tue, Aug 11, 2015 at 12:58 PM, Ritesh Kumar Singh <
riteshoneinamillion@gmail.com> wrote:

> Let's assume that hdfs maintains 3 replicas of the 256MB block, then all
> of these 3 datanodes will have only one copy of the block in their
> respective mem cache and thus avoiding the repeated i/o reads. This goes
> with the centralized cache management policy of hdfs that also gives you an
> option to pin 2 of these 3 blocks in cache and save the remaining 256MB of
> cache space. Here's a link
> <https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html>
on
> the same.
>
> Hope that helps.
>
> Ritesh
>

Mime
View raw message