hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: DistributedCache - why not read directly from HDFS?
Date Sat, 23 Mar 2013 15:17:43 GMT
A DistributedCache is not used just to distribute simple files but
also native libraries and such which cannot be loaded by certain if
its on HDFS.

Also, keeping it on HDFS could provide less performant as non-local
reads could happen (depending on the files' replication factor).

On Sat, Mar 23, 2013 at 8:23 PM, Alberto Cordioli
<cordioli.alberto@gmail.com> wrote:
> Hi all,
>
> I was not able to find an answer to the following question. If the
> question has already been answered please give me the pointer to the
> right thread.
>
> Which are actually the differences between read file from HDFS in one
> mapper and use DistributedCache.
>
> I saw that with DistributedCache you can give an hdfs path and the
> task nodes will get the data on local file system. But which
> advantages we have compared with a simple HDFS read with
> FSDataInputStream.open() method?
>
> Thank you very much,
> Alberto
>
>
> --
> Alberto Cordioli



-- 
Harsh J

Mime
View raw message